Add support for ARM NEON targets.

Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
This commit is contained in:
Matt Pharr
2013-07-19 11:06:11 -07:00
parent b007bba59f
commit d7b0c5794e
22 changed files with 914 additions and 67 deletions

View File

@@ -638,26 +638,50 @@ AddBitcodeToModule(const unsigned char *bitcode, int length,
// linking together modules with incompatible target triples..
llvm::Triple mTriple(m->module->getTargetTriple());
llvm::Triple bcTriple(bcModule->getTargetTriple());
Assert(bcTriple.getArch() == llvm::Triple::UnknownArch ||
mTriple.getArch() == bcTriple.getArch());
Assert(bcTriple.getVendor() == llvm::Triple::UnknownVendor ||
mTriple.getVendor() == bcTriple.getVendor());
bcModule->setTargetTriple(mTriple.str());
Debug(SourcePos(), "module triple: %s\nbitcode triple: %s\n",
mTriple.str().c_str(), bcTriple.str().c_str());
#ifndef __arm__
// FIXME: More ugly and dangerous stuff. We really haven't set up
// proper build and runtime infrastructure for ispc to do
// cross-compilation, yet it's at minimum useful to be able to emit
// ARM code from x86 for ispc development. One side-effect is that
// when the build process turns builtins/builtins.c to LLVM bitcode
// for us to link in at runtime, that bitcode has been compiled for
// an IA target, which in turn causes the checks in the following
// code to (appropraitely) fail.
//
// In order to be able to have some ability to generate ARM code on
// IA, we'll just skip those tests in that case and allow the
// setTargetTriple() and setDataLayout() calls below to shove in
// the values for an ARM target. This maybe won't cause problems
// in the generated code, since bulitins.c doesn't do anything too
// complex w.r.t. struct layouts, etc.
if (g->target->getISA() != Target::NEON)
#endif // !__arm__
{
Assert(bcTriple.getArch() == llvm::Triple::UnknownArch ||
mTriple.getArch() == bcTriple.getArch());
Assert(bcTriple.getVendor() == llvm::Triple::UnknownVendor ||
mTriple.getVendor() == bcTriple.getVendor());
// We unconditionally set module DataLayout to library, but we must
// ensure that library and module DataLayouts are compatible.
// If they are not, we should recompile the library for problematic
// architecture and investigate what happened.
// Generally we allow library DataLayout to be subset of module
// DataLayout or library DataLayout to be empty.
if (!VerifyDataLayoutCompatibility(module->getDataLayout(),
bcModule->getDataLayout())) {
Error(SourcePos(), "Module DataLayout is incompatible with library DataLayout:\n"
"Module DL: %s\n"
"Library DL: %s\n",
module->getDataLayout().c_str(), bcModule->getDataLayout().c_str());
// We unconditionally set module DataLayout to library, but we must
// ensure that library and module DataLayouts are compatible.
// If they are not, we should recompile the library for problematic
// architecture and investigate what happened.
// Generally we allow library DataLayout to be subset of module
// DataLayout or library DataLayout to be empty.
if (!VerifyDataLayoutCompatibility(module->getDataLayout(),
bcModule->getDataLayout())) {
Warning(SourcePos(), "Module DataLayout is incompatible with "
"library DataLayout:\n"
"Module DL: %s\n"
"Library DL: %s\n",
module->getDataLayout().c_str(),
bcModule->getDataLayout().c_str());
}
}
bcModule->setTargetTriple(mTriple.str());
bcModule->setDataLayout(module->getDataLayout());
std::string(linkError);
@@ -795,6 +819,15 @@ DefineStdlib(SymbolTable *symbolTable, llvm::LLVMContext *ctx, llvm::Module *mod
// Next, add the target's custom implementations of the various needed
// builtin functions (e.g. __masked_store_32(), etc).
switch (g->target->getISA()) {
case Target::NEON: {
if (runtime32) {
EXPORT_MODULE(builtins_bitcode_neon_32bit);
}
else {
EXPORT_MODULE(builtins_bitcode_neon_64bit);
}
break;
}
case Target::SSE2: {
switch (g->target->getVectorWidth()) {
case 4: