d7b0c5794ef819a4c039a725681bdabbac946223
Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests
pass, and all examples compile and run correctly. Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.
Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined. Known
issues/opportunities for improvement include:
- fp32 vector divide is done as a series of scalar divides rather than
a vector divide (which I believe exists, but I may be mistaken.)
This is particularly harmful to examples/rt, which only runs ~1.5x
faster with ispc, likely due to long chains of scalar divides.
- The compiler isn't generating a vmin.f32 for e.g. the final scalar
min in reduce_min(); instead it's generating a compare and then a
select instruction (and similarly elsewhere).
- There are some additional FIXMEs in builtins/target-neon.ll that
include both a few pieces of missing functionality (e.g. rounding
doubles) as well as places that deserve attention for possible
code quality improvements.
- Currently only the "cortex-a9" and "cortex-15" CPU targets are
supported; LLVM supports many other ARM CPUs and ispc should provide
access to all of the ones that have NEON support (and aren't too
obscure.)
- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
only when the compiler runs on an ARM host, though).
- The Windows build hasn't been tested (though I've tried to update
ispc.vcxproj appropriately). It may just work, but will more likely
have various small issues.)
- Anything related to 64-bit ARM has seen no attention.
==============================
Intel(r) SPMD Program Compiler
==============================
``ispc`` is a compiler for a variant of the C programming language, with
extensions for `single program, multiple data
<http://en.wikipedia.org/wiki/SPMD>`_ programming. Under the SPMD model,
the programmer writes a program that generally appears to be a regular
serial program, though the execution model is actually that a number of
*program instances* execute in parallel on the hardware.
Overview
--------
``ispc`` compiles a C-based SPMD programming language to run on the SIMD
units of CPUs; it frequently provides a 3x or more speedup on CPUs with
4-wide vector SSE units and 5x-6x on CPUs with 8-wide AVX vector units,
without any of the difficulty of writing intrinsics code. Parallelization
across multiple cores is also supported by ``ispc``, making it
possible to write programs that achieve performance improvement that scales
by both number of cores and vector unit size.
There are a few key principles in the design of ``ispc``:
* To build a small set of extensions to the C language that
would deliver excellent performance to performance-oriented
programmers who want to run SPMD programs on the CPU.
* To provide a thin abstraction layer between the programmer
and the hardware--in particular, to have an execution and
data model where the programmer can cleanly reason about the
mapping of their source program to compiled assembly language
and the underlying hardware.
* To make it possible to harness the computational power of SIMD
vector units without the extremely low-programmer-productivity
activity of directly writing intrinsics.
* To explore opportunities from close coupling between C/C++
application code and SPMD ``ispc`` code running on the
same processor--to have lightweight function calls between
the two languages and to share data directly via pointers without
copying or reformatting.
``ispc`` is an open source compiler with the BSD license. It uses the
remarkable `LLVM Compiler Infrastructure <http://llvm.org>`_ for back-end
code generation and optimization and is `hosted on
github <http://github.com/ispc/ispc/>`_. It supports Windows, Mac, and
Linux, with both x86 and x86-64 targets. It currently supports the SSE2,
SSE4, AVX1, and AVX2 instruction sets.
Features
--------
``ispc`` provides a number of key features to developers:
* Familiarity as an extension of the C programming
language: ``ispc`` supports familiar C syntax and
programming idioms, while adding the ability to write SPMD
programs.
* High-quality SIMD code generation: the performance
of code generated by ``ispc`` is often close to that of
hand-written intrinsics code.
* Ease of adoption with existing software
systems: functions written in ``ispc`` directly
interoperate with application functions written in C/C++ and
with application data structures.
* Portability across over a decade of CPU
generations: ``ispc`` has targets for SSE2, SSE4, AVX
(and soon, AVX2).
* Portability across operating systems: Microsoft
Windows, Mac OS X, and Linux are all supported
by ``ispc``.
* Debugging with standard tools: ``ispc``
programs can be debugged with standard debuggers (OS X and
Linux only).
Additional Resources
--------------------
Prebuilt ``ispc`` binaries for Windows, OS X and Linux can be downloaded
from the `ispc downloads page <http://ispc.github.com/downloads.html>`_.
See also additional
`documentation <http://ispc.github.com/documentation.html>`_ and additional
`performance information <http://ispc.github.com/perf.html>`_.
Description
Languages
C++
63.5%
LLVM
19.1%
M4
11.6%
Python
4.5%
Makefile
0.5%
Other
0.6%