Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests
pass, and all examples compile and run correctly. Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.
Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined. Known
issues/opportunities for improvement include:
- fp32 vector divide is done as a series of scalar divides rather than
a vector divide (which I believe exists, but I may be mistaken.)
This is particularly harmful to examples/rt, which only runs ~1.5x
faster with ispc, likely due to long chains of scalar divides.
- The compiler isn't generating a vmin.f32 for e.g. the final scalar
min in reduce_min(); instead it's generating a compare and then a
select instruction (and similarly elsewhere).
- There are some additional FIXMEs in builtins/target-neon.ll that
include both a few pieces of missing functionality (e.g. rounding
doubles) as well as places that deserve attention for possible
code quality improvements.
- Currently only the "cortex-a9" and "cortex-15" CPU targets are
supported; LLVM supports many other ARM CPUs and ispc should provide
access to all of the ones that have NEON support (and aren't too
obscure.)
- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
only when the compiler runs on an ARM host, though).
- The Windows build hasn't been tested (though I've tried to update
ispc.vcxproj appropriately). It may just work, but will more likely
have various small issues.)
- Anything related to 64-bit ARM has seen no attention.
This commit is contained in:
12
main.cpp
12
main.cpp
@@ -243,12 +243,24 @@ int main(int Argc, char *Argv[]) {
|
||||
llvm::sys::AddSignalHandler(lSignal, NULL);
|
||||
|
||||
// initialize available LLVM targets
|
||||
#ifndef __arm__
|
||||
// FIXME: LLVM build on ARM doesn't build the x86 targets by default.
|
||||
// It's not clear that anyone's going to want to generate x86 from an
|
||||
// ARM host, though...
|
||||
LLVMInitializeX86TargetInfo();
|
||||
LLVMInitializeX86Target();
|
||||
LLVMInitializeX86AsmPrinter();
|
||||
LLVMInitializeX86AsmParser();
|
||||
LLVMInitializeX86Disassembler();
|
||||
LLVMInitializeX86TargetMC();
|
||||
#endif // !__ARM__
|
||||
// Generating ARM from x86 is more likely to be useful, though.
|
||||
LLVMInitializeARMTargetInfo();
|
||||
LLVMInitializeARMTarget();
|
||||
LLVMInitializeARMAsmPrinter();
|
||||
LLVMInitializeARMAsmParser();
|
||||
LLVMInitializeARMDisassembler();
|
||||
LLVMInitializeARMTargetMC();
|
||||
|
||||
char *file = NULL;
|
||||
const char *headerFileName = NULL;
|
||||
|
||||
Reference in New Issue
Block a user