This should be a bool, not a one-wide vector of bools. The equivalent
fix was previously made in generic-16.h, but not made here. (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
For KNC (gather/scatter), it's not helpful to factor base+offsets gathers
and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets.
Now, if a HW instruction is available for gather/scatter, we just factor
into base + {1/2/4/8} * offsets (if possible). Not only is this simpler,
but it's also what we need to pass a value along to the scale by
2/4/8 available directly in those instructions.
Finishes issue #325.
No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
Add peephole optimization to eliminate some mask AND operations.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.
This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "and(equalfloat(a, b), c)" is changed to
"_equal_float_and_mask(a, b, c)", saving an instruction in the end.
Issue #319.
Merge commit '8ef6bc16364d4c08aa5972141748110160613087'
Conflicts:
examples/intrinsics/knc.h
examples/intrinsics/sse4.h
Fixes include adding "_float" and "_double" suffixes as appropriate as well
as providing a number of missing implementations.
This fixes a number of failures in the half* tests.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.
This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "__and(__equal_float(a, b), c)" is changed to
"__equal_float_and_mask(a, b, c)", saving an instruction in the end.
Issue #319.
e.g. "__equal()" -> "__equal_float()", etc.
No functional change; this is necessary groundwork for a forthcoming
peephole optimization that eliminates ANDs of masks in some cases.
If we have a vector of all zeros, a __setzero_* function call is emitted,
permitting calling specialized intrinsics for this. Undefined values
are reflected with an __undef_* call, which similarly allows passing that
information along.
This change also includes a cleanup to the signature of the __smear_*
functions; since they already have different names depending on the
scalar value type, we don't need to use the trick of passing an
undefined value of the return vector type as the first parameter as
an indirect way to overload by return value.
Issue #317.
This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)
Updated generic-32.h and generic-64.h to the new memory API