Using vector select versus a store and masked load for varying vector
selects seems to give worse code. This may be related to
http://llvm.org/bugs/show_bug.cgi?id=16941.
Commit 53414f12e6 introduced a but where lEmitVaryingSelect() would
try to truncate a vector of i1s to a vector of i1s, which in turn
made LLVM's IR analyzer unhappy.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits. (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements. This
commit makes it possible to have a target with an 8- or 16-bit mask.
For varying int8/16/32 types, divides by small constants can be
implemented efficiently through multiplies and shifts with integer
types of twice the bit-width; this commit adds this optimization.
(Implementation is based on Halide.)
Fixes:
- Don't issue a warning when the shift is a by the same amount in all
vector lanes.
- Do issue a warning when it's a compile-time constant but the values
are different in different lanes.
Previously, we warned iff the shift amount wasn't a compile-time constant.
We can now do constant folding with all basic datatypes (the previous
implementation handled int32 well, but had limited, if any, coverage
for other datatypes.)
Reduced a bit of repeated code in the constant folding implementation
through template helper functions.
I limit the fix to uniformed index to avoid widening a varying index vector to 64 bits. This means that the 32 bit values in the varying indices must be positive and smaller than 2^31 at the runtime for a program to behave correctly.
In particular, this gives us desired behavior for NaNs (all compares
involving a NaN evaluate to true). This in turn allows writing the
canonical isnan() function as "v != v".
Added isnan() to the standard library as well.
It can sometimes be useful to know the general place we were in the program
when an assertion hit; when the position is available / applicable, this
macro is now used.
Issue #268.
We now have a set of template functions CastType<AtomicType>, etc., that in
turn use a new typeId field in each Type instance, allowing them to be inlined
and to be quite efficient.
This improves front-end performance for a particular large program by 28%.
We now try harder to keep the names of instructions related to the
initial names of variables they're derived from and so forth. This
is useful for making both LLVM IR as well as generated C++ code
easier to correlate back to the original ispc source code.
Issue #244.
Issue an error, rather than crashing, if the user has declared a
struct type but not defined it and subsequently tries to:
- dynamically allocate an instance of the struct type
- do pointer math with a pointer to the struct type
- compute the size of the struct type
Now a declaration like 'struct Foo;' can be used to establish the
name of a struct type, without providing a definition. One can
pass pointers to such types around the system, but can't do much
else with them (as in C/C++).
Issue #125.