(i.e., stop just reusing the ones for AVX1).
For now the only difference is that the int/uint min/max
functions call the new intrinsic for that. Once gather is
available from LLVM, that will go here as well.
The compiler now supports an --emit-c++ option, which generates generic
vector C++ code. To actually compile this code, the user must provide
C++ code that implements a variety of types and operations (e.g. adding
two floating-point vector values together, comparing them, etc).
There are two examples of this required code in examples/intrinsics:
generic-16.h is a "generic" 16-wide implementation that does all required
with scalar math; it's useful for demonstrating the requirements of the
implementation. Then, sse4.h shows a simple implementation of a SSE4
target that maps the emitted function calls to SSE intrinsics.
When using these example implementations with the ispc test suite,
all but one or two tests pass with gcc and clang on Linux and OSX.
There are currently ~10 failures with icc on Linux, and ~50 failures with
MSVC 2010. (To be fixed in coming days.)
Performance varies: when running the examples through the sse4.h
target, some have the same performance as when compiled with --target=sse4
from ispc directly (options), while noise is 12% slower, rt is 26%
slower, and aobench is 2.2x slower. The details of this haven't yet been
carefully investigated, but will be in coming days as well.
Issue #92.
When used, these targets end up with calls to undefined functions for all
of the various special vector stuff ispc needs to compile ispc programs
(masked store, gather, min/max, sqrt, etc.).
These targets are not yet useful for anything, but are a step toward
having an option to C++ code with calls out to intrinsics.
Reorganized the directory structure a bit and put the LLVM bitcode used
to define target-specific stuff (as well as some generic built-ins stuff)
into a builtins/ directory.
Note that for building on Windows, it's now necessary to set a LLVM_VERSION
environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)
Added AST and Function classes.
Now, we parse the whole file and build up the AST for all of the
functions in the Module before we emit IR for the functions (vs. before,
when we generated IR along the way as we parsed the source file.)
If a flag along the lines of "--target=sse4,avx-x2" is provided on the command-line,
then the program will be compiled for each of the given targets, with a separate
output file generated for each one. Further, an output file with dispatch functions
that check the current system's CPU and then chooses the best available variant
is also created.
Issue #11.
If no CPU is specified, use the host CPU type, not just a default of "nehalem".
Provide better features strings to the LLVM target machinery.
-> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2
target (Fixes issue #82).
-> Slight code improvements from using cmovs in generated code now
Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code
that doesn't call the popcnt instruction now that we properly tell LLVM
which instructions are and aren't available for SSE2.)
Set the Module's target appropriately when it's first created.
Compile separate 32 and 64 bit versions of the builtins-c bitcocde
and load the appropriate one based on the target we're compiling
for.
- Renamed stdlib-sse.ll to builtins-sse.ll (etc.) in an attempt to better indicate
the fact that the stuff in those files has a role beyond implementing stuff for
the standard library.
- Moved declarations of the various __pseudo_* functions from being done with LLVM
API calls in builtins.cpp to just straight up declarations in LLVM assembly
language in builtins.m4. (Much less code to do it this way, and more clear what's
going on.)
This checkin provides the standard set of atomic operations and a memory barrier in the ispc standard library. Both signed and unsigned 32- and 64-bit integer types are supported.
When creating function Symbols for functions that were defined in LLVM bitcode for the standard library, if any of the function parameters are integer types, create two ispc-side Symbols: one where the integer types are all signed and the other where they are all unsigned. This allows us to provide, for example, both store_to_int16(reference int a[], uniform int offset, int val) as well as store_to_int16(reference unsigned int a[], uniform int offset, unsigned int val). functions.
Added some additional tests to exercise the new variants of these.
Also fixed some cases where the __{load,store}_int{8,16} builtins would read from/write to memory even if the mask was all off (which could cause crashes in some cases.)