Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests
pass, and all examples compile and run correctly. Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.
Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined. Known
issues/opportunities for improvement include:
- fp32 vector divide is done as a series of scalar divides rather than
a vector divide (which I believe exists, but I may be mistaken.)
This is particularly harmful to examples/rt, which only runs ~1.5x
faster with ispc, likely due to long chains of scalar divides.
- The compiler isn't generating a vmin.f32 for e.g. the final scalar
min in reduce_min(); instead it's generating a compare and then a
select instruction (and similarly elsewhere).
- There are some additional FIXMEs in builtins/target-neon.ll that
include both a few pieces of missing functionality (e.g. rounding
doubles) as well as places that deserve attention for possible
code quality improvements.
- Currently only the "cortex-a9" and "cortex-15" CPU targets are
supported; LLVM supports many other ARM CPUs and ispc should provide
access to all of the ones that have NEON support (and aren't too
obscure.)
- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
only when the compiler runs on an ARM host, though).
- The Windows build hasn't been tested (though I've tried to update
ispc.vcxproj appropriately). It may just work, but will more likely
have various small issues.)
- Anything related to 64-bit ARM has seen no attention.
This commit is contained in:
67
builtins.cpp
67
builtins.cpp
@@ -638,26 +638,50 @@ AddBitcodeToModule(const unsigned char *bitcode, int length,
|
||||
// linking together modules with incompatible target triples..
|
||||
llvm::Triple mTriple(m->module->getTargetTriple());
|
||||
llvm::Triple bcTriple(bcModule->getTargetTriple());
|
||||
Assert(bcTriple.getArch() == llvm::Triple::UnknownArch ||
|
||||
mTriple.getArch() == bcTriple.getArch());
|
||||
Assert(bcTriple.getVendor() == llvm::Triple::UnknownVendor ||
|
||||
mTriple.getVendor() == bcTriple.getVendor());
|
||||
bcModule->setTargetTriple(mTriple.str());
|
||||
Debug(SourcePos(), "module triple: %s\nbitcode triple: %s\n",
|
||||
mTriple.str().c_str(), bcTriple.str().c_str());
|
||||
#ifndef __arm__
|
||||
// FIXME: More ugly and dangerous stuff. We really haven't set up
|
||||
// proper build and runtime infrastructure for ispc to do
|
||||
// cross-compilation, yet it's at minimum useful to be able to emit
|
||||
// ARM code from x86 for ispc development. One side-effect is that
|
||||
// when the build process turns builtins/builtins.c to LLVM bitcode
|
||||
// for us to link in at runtime, that bitcode has been compiled for
|
||||
// an IA target, which in turn causes the checks in the following
|
||||
// code to (appropraitely) fail.
|
||||
//
|
||||
// In order to be able to have some ability to generate ARM code on
|
||||
// IA, we'll just skip those tests in that case and allow the
|
||||
// setTargetTriple() and setDataLayout() calls below to shove in
|
||||
// the values for an ARM target. This maybe won't cause problems
|
||||
// in the generated code, since bulitins.c doesn't do anything too
|
||||
// complex w.r.t. struct layouts, etc.
|
||||
if (g->target->getISA() != Target::NEON)
|
||||
#endif // !__arm__
|
||||
{
|
||||
Assert(bcTriple.getArch() == llvm::Triple::UnknownArch ||
|
||||
mTriple.getArch() == bcTriple.getArch());
|
||||
Assert(bcTriple.getVendor() == llvm::Triple::UnknownVendor ||
|
||||
mTriple.getVendor() == bcTriple.getVendor());
|
||||
|
||||
// We unconditionally set module DataLayout to library, but we must
|
||||
// ensure that library and module DataLayouts are compatible.
|
||||
// If they are not, we should recompile the library for problematic
|
||||
// architecture and investigate what happened.
|
||||
// Generally we allow library DataLayout to be subset of module
|
||||
// DataLayout or library DataLayout to be empty.
|
||||
if (!VerifyDataLayoutCompatibility(module->getDataLayout(),
|
||||
bcModule->getDataLayout())) {
|
||||
Error(SourcePos(), "Module DataLayout is incompatible with library DataLayout:\n"
|
||||
"Module DL: %s\n"
|
||||
"Library DL: %s\n",
|
||||
module->getDataLayout().c_str(), bcModule->getDataLayout().c_str());
|
||||
// We unconditionally set module DataLayout to library, but we must
|
||||
// ensure that library and module DataLayouts are compatible.
|
||||
// If they are not, we should recompile the library for problematic
|
||||
// architecture and investigate what happened.
|
||||
// Generally we allow library DataLayout to be subset of module
|
||||
// DataLayout or library DataLayout to be empty.
|
||||
if (!VerifyDataLayoutCompatibility(module->getDataLayout(),
|
||||
bcModule->getDataLayout())) {
|
||||
Warning(SourcePos(), "Module DataLayout is incompatible with "
|
||||
"library DataLayout:\n"
|
||||
"Module DL: %s\n"
|
||||
"Library DL: %s\n",
|
||||
module->getDataLayout().c_str(),
|
||||
bcModule->getDataLayout().c_str());
|
||||
}
|
||||
}
|
||||
|
||||
bcModule->setTargetTriple(mTriple.str());
|
||||
bcModule->setDataLayout(module->getDataLayout());
|
||||
|
||||
std::string(linkError);
|
||||
@@ -795,6 +819,15 @@ DefineStdlib(SymbolTable *symbolTable, llvm::LLVMContext *ctx, llvm::Module *mod
|
||||
// Next, add the target's custom implementations of the various needed
|
||||
// builtin functions (e.g. __masked_store_32(), etc).
|
||||
switch (g->target->getISA()) {
|
||||
case Target::NEON: {
|
||||
if (runtime32) {
|
||||
EXPORT_MODULE(builtins_bitcode_neon_32bit);
|
||||
}
|
||||
else {
|
||||
EXPORT_MODULE(builtins_bitcode_neon_64bit);
|
||||
}
|
||||
break;
|
||||
}
|
||||
case Target::SSE2: {
|
||||
switch (g->target->getVectorWidth()) {
|
||||
case 4:
|
||||
|
||||
Reference in New Issue
Block a user