Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests
pass, and all examples compile and run correctly. Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.
Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined. Known
issues/opportunities for improvement include:
- fp32 vector divide is done as a series of scalar divides rather than
a vector divide (which I believe exists, but I may be mistaken.)
This is particularly harmful to examples/rt, which only runs ~1.5x
faster with ispc, likely due to long chains of scalar divides.
- The compiler isn't generating a vmin.f32 for e.g. the final scalar
min in reduce_min(); instead it's generating a compare and then a
select instruction (and similarly elsewhere).
- There are some additional FIXMEs in builtins/target-neon.ll that
include both a few pieces of missing functionality (e.g. rounding
doubles) as well as places that deserve attention for possible
code quality improvements.
- Currently only the "cortex-a9" and "cortex-15" CPU targets are
supported; LLVM supports many other ARM CPUs and ispc should provide
access to all of the ones that have NEON support (and aren't too
obscure.)
- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
only when the compiler runs on an ARM host, though).
- The Windows build hasn't been tested (though I've tried to update
ispc.vcxproj appropriately). It may just work, but will more likely
have various small issues.)
- Anything related to 64-bit ARM has seen no attention.
This commit is contained in:
16
run_tests.py
16
run_tests.py
@@ -37,10 +37,10 @@ parser.add_option("-g", "--generics-include", dest="include_file", help="Filenam
|
||||
parser.add_option("-f", "--ispc-flags", dest="ispc_flags", help="Additional flags for ispc (-g, -O1, ...)",
|
||||
default="")
|
||||
parser.add_option('-t', '--target', dest='target',
|
||||
help='Set compilation target (sse2, sse2-x2, sse4, sse4-x2, avx, avx-x2, generic-4, generic-8, generic-16, generic-32)',
|
||||
help='Set compilation target (neon, sse2, sse2-x2, sse4, sse4-x2, avx, avx-x2, generic-4, generic-8, generic-16, generic-32)',
|
||||
default="sse4")
|
||||
parser.add_option('-a', '--arch', dest='arch',
|
||||
help='Set architecture (x86, x86-64)',
|
||||
help='Set architecture (arm, x86, x86-64)',
|
||||
default="x86-64")
|
||||
parser.add_option("-c", "--compiler", dest="compiler_exe", help="Compiler binary to use to run tests",
|
||||
default=None)
|
||||
@@ -58,6 +58,9 @@ parser.add_option('--time', dest='time', help='Enable time output',
|
||||
|
||||
(options, args) = parser.parse_args()
|
||||
|
||||
if options.target == 'neon':
|
||||
options.arch = 'arm'
|
||||
|
||||
# use relative path to not depend on host directory, which may possibly
|
||||
# have white spaces and unicode characters.
|
||||
if not is_windows:
|
||||
@@ -345,10 +348,13 @@ def run_test(testname):
|
||||
obj_name = "%s.o" % testname
|
||||
exe_name = "%s.run" % testname
|
||||
|
||||
if options.arch == 'x86':
|
||||
gcc_arch = '-m32'
|
||||
if options.arch == 'arm':
|
||||
gcc_arch = '--with-fpu=hardfp -marm -mfpu=neon -mfloat-abi=hard'
|
||||
else:
|
||||
gcc_arch = '-m64'
|
||||
if options.arch == 'x86':
|
||||
gcc_arch = '-m32'
|
||||
else:
|
||||
gcc_arch = '-m64'
|
||||
|
||||
gcc_isa=""
|
||||
if options.target == 'generic-4':
|
||||
|
||||
Reference in New Issue
Block a user