Add support for emitting ~generic vectorized C++ code.

The compiler now supports an --emit-c++ option, which generates generic
vector C++ code.  To actually compile this code, the user must provide
C++ code that implements a variety of types and operations (e.g. adding
two floating-point vector values together, comparing them, etc).

There are two examples of this required code in examples/intrinsics:
generic-16.h is a "generic" 16-wide implementation that does all required
with scalar math; it's useful for demonstrating the requirements of the
implementation.  Then, sse4.h shows a simple implementation of a SSE4
target that maps the emitted function calls to SSE intrinsics.

When using these example implementations with the ispc test suite,
all but one or two tests pass with gcc and clang on Linux and OSX.
There are currently ~10 failures with icc on Linux, and ~50 failures with
MSVC 2010.  (To be fixed in coming days.)

Performance varies: when running the examples through the sse4.h
target, some have the same performance as when compiled with --target=sse4
from ispc directly (options), while noise is 12% slower, rt is 26%
slower, and aobench is 2.2x slower.  The details of this haven't yet been
carefully investigated, but will be in coming days as well.

Issue #92.
This commit is contained in:
Matt Pharr
2012-01-04 12:37:26 -08:00
parent 4151778f5e
commit 8938e14442
11 changed files with 9594 additions and 27 deletions

View File

@@ -56,6 +56,7 @@ Contents:
+ `Basic Command-line Options`_
+ `Selecting The Compilation Target`_
+ `Generating Generic C++ Output`_
+ `Selecting 32 or 64 Bit Addressing`_
+ `The Preprocessor`_
+ `Debugging`_
@@ -432,6 +433,65 @@ Intel® SSE2, use ``--target=sse2``. (As with the other options in this
section, see the output of ``ispc --help`` for a full list of supported
targets.)
Generating Generic C++ Output
-----------------------------
In addition to generating object files or assembly output for specific
targets like SSE2, SSE4, and AVX, ``ispc`` provides an option to generate
"generic" C++ output. This
As an example, consider the following simple ``ispc`` program:
::
int foo(int i, int j) {
return (i < 0) ? 0 : i + j;
}
If this program is compiled with the following command:
::
ispc foo.ispc --emit-c++ --target=generic-4 -o foo.cpp
Then ``foo()`` is compiled to the following C++ code (after various
automatically-generated boilerplate code):
::
__vec4_i32 foo(__vec4_i32 i_llvm_cbe, __vec4_i32 j_llvm_cbe,
__vec4_i1 __mask_llvm_cbe) {
return (__select((__signed_less_than(i_llvm_cbe,
__vec4_i32 (0u, 0u, 0u, 0u))),
__vec4_i32 (0u, 0u, 0u, 0u),
(__add(i_llvm_cbe, j_llvm_cbe))));
}
Note that the original computation has been expressed in terms of a number
of vector types (e.g. ``__vec4_i32`` for a 4-wide vector of 32-bit integers
and ``__vec4_i1`` for a 4-wide vector of boolean values) and in terms of
vector operations on these types like ``__add()`` and ``__select()``).
You are then free to provide your own implementations of these types and
functions. For example, you might want to target a specific vector ISA, or
you might want to instrument these functions for performance measurements.
There is an example implementation of 4-wide variants of the required
functions, suitable for use with the ``generic-4`` target in the file
``examples/intrinsics/sse4.h``, and there is an example straightforward C
implementation of the 16-wide variants for the ``generic-16`` target in the
file ``examples/intrinsics/generic-16.h``. There is not yet comprehensive
documentation of these types and the functions that must be provided for
them when the C++ target is used, but a review of those two files should
provide the basic context.
If you are using C++ source emission, you may also find the
``--c++-include-file=<filename>`` command line argument useful; it adds an
``#include`` statement with the given filename at the top of the emitted
C++ file; this can be used to easily include specific implementations of
the vector types and functions.
Selecting 32 or 64 Bit Addressing
---------------------------------