LLVM 3.4 (due to more aggressive optimizations): vector of *the same*
constants should be generated as scalar value in cpp file, instead of
__extract_element(splat(value), 0).
I.e. <2,2,2,2> should appear in cpp as 2, but not
__extract_element(splat(2), 0);
Add peephole optimization to eliminate some mask AND operations.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.
This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "and(equalfloat(a, b), c)" is changed to
"_equal_float_and_mask(a, b, c)", saving an instruction in the end.
Issue #319.
Merge commit '8ef6bc16364d4c08aa5972141748110160613087'
Conflicts:
examples/intrinsics/knc.h
examples/intrinsics/sse4.h
Fixes include adding "_float" and "_double" suffixes as appropriate as well
as providing a number of missing implementations.
This fixes a number of failures in the half* tests.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.
This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "__and(__equal_float(a, b), c)" is changed to
"__equal_float_and_mask(a, b, c)", saving an instruction in the end.
Issue #319.
e.g. "__equal()" -> "__equal_float()", etc.
No functional change; this is necessary groundwork for a forthcoming
peephole optimization that eliminates ANDs of masks in some cases.
When we have a constant vector of primitive types, we now generate
a definition of a static const array of the individual values. This
in turn allows us to emit a simple aligned vector load to get the
constant vector value, rather than inefficiently inserting the values
into a vector.
Issue #318.