Using vector select versus a store and masked load for varying vector
selects seems to give worse code. This may be related to
http://llvm.org/bugs/show_bug.cgi?id=16941.
Commit 53414f12e6 introduced a but where lEmitVaryingSelect() would
try to truncate a vector of i1s to a vector of i1s, which in turn
made LLVM's IR analyzer unhappy.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits. (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements. This
commit makes it possible to have a target with an 8- or 16-bit mask.
For varying int8/16/32 types, divides by small constants can be
implemented efficiently through multiplies and shifts with integer
types of twice the bit-width; this commit adds this optimization.
(Implementation is based on Halide.)
Fixes:
- Don't issue a warning when the shift is a by the same amount in all
vector lanes.
- Do issue a warning when it's a compile-time constant but the values
are different in different lanes.
Previously, we warned iff the shift amount wasn't a compile-time constant.
We can now do constant folding with all basic datatypes (the previous
implementation handled int32 well, but had limited, if any, coverage
for other datatypes.)
Reduced a bit of repeated code in the constant folding implementation
through template helper functions.
I limit the fix to uniformed index to avoid widening a varying index vector to 64 bits. This means that the 32 bit values in the varying indices must be positive and smaller than 2^31 at the runtime for a program to behave correctly.
In particular, this gives us desired behavior for NaNs (all compares
involving a NaN evaluate to true). This in turn allows writing the
canonical isnan() function as "v != v".
Added isnan() to the standard library as well.