No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
Add peephole optimization to eliminate some mask AND operations.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.
This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "and(equalfloat(a, b), c)" is changed to
"_equal_float_and_mask(a, b, c)", saving an instruction in the end.
Issue #319.
Merge commit '8ef6bc16364d4c08aa5972141748110160613087'
Conflicts:
examples/intrinsics/knc.h
examples/intrinsics/sse4.h
All of these pass the current mask to FunctionEmitContext::SetBlockEntryMask()
so that when a break/continue/return is encountered, it can test to see if all
lanes have followed that path and then return; this in turn ensures that we never
run statements with an all-off execution mask.
These functions were passing the function internal mask, not the full mask, and
thus could end up executing code with the mask all off if some lanes were
disabled by an outer function. (The new tests test this case.)
Fixes include adding "_float" and "_double" suffixes as appropriate as well
as providing a number of missing implementations.
This fixes a number of failures in the half* tests.
1. For some time now, we provide the version without the 'svn'
2. We should be testing "not LLVM 3.0" in these cases, since they
apply to LLVM 3.2 and beyond as well...
We now follow the rule that the type of an integer constant is
the first of int32, uint32, int64, uint64, that can hold the
value. (Unless 'u' or 'l' suffixes have been provided.)
Fixes issue #299.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.
This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "__and(__equal_float(a, b), c)" is changed to
"__equal_float_and_mask(a, b, c)", saving an instruction in the end.
Issue #319.
e.g. "__equal()" -> "__equal_float()", etc.
No functional change; this is necessary groundwork for a forthcoming
peephole optimization that eliminates ANDs of masks in some cases.
Flag 32-bit vector types as only requiring 32-bit alignment (preemptive
bug fix for 32xi1 vectors).
Force module datalayouts to be the same before linking them to silence
an LLVM warning.
Finishes issue #309.
When "break", "continue", or "return" is used under varying control flow,
we now always check the execution mask to see if all of the program
instances are executing it. (Previously, this was only done with "cbreak",
"ccontinue", and "creturn", which are now deprecated.)
An important effect of this change is that it fixes a family of cases
where we could end up running with an "all off" execution mask, which isn't
supposed to happen, as it leads to all sorts of invalid behavior.
This change does cause the volume rendering example to run 9% slower, but
doesn't affect the other examples.
Issue #257.
This fixes a crash if 'cbreak' was used in a 'switch'. Renamed
FunctionEmitContext::SetLoopMask() to SetBlockEntryMask(), and
similarly the loopMask member variable.
When we have a constant vector of primitive types, we now generate
a definition of a static const array of the individual values. This
in turn allows us to emit a simple aligned vector load to get the
constant vector value, rather than inefficiently inserting the values
into a vector.
Issue #318.
If we have a vector of all zeros, a __setzero_* function call is emitted,
permitting calling specialized intrinsics for this. Undefined values
are reflected with an __undef_* call, which similarly allows passing that
information along.
This change also includes a cleanup to the signature of the __smear_*
functions; since they already have different names depending on the
scalar value type, we don't need to use the trick of passing an
undefined value of the return vector type as the first parameter as
an indirect way to overload by return value.
Issue #317.
Fixes to __load and __store.
Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext,
and __scatter_base_offsets32_float.
__rcp_varying_float now has a fast-math and full-precision implementation.
Some modules require an include of unistd.h (e.g. for getcwd and isatty
definitions).
These changes were required to build successfully on a Fedora 17 system,
using GCC 4.7.0 & glibc-headers 2.15.