Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then
use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have
separate float/double variants of each of those.
Change function suffix to "_i32", etc, from "_32"
Improve load_and_broadcast macro in util.m4 to grab vector width from
WIDTH variable rather than taking it as a parameter.
If we have a simple varying 'if' statement where the only code in the body is
a single 'break', then emit special case code that just updates the execution
mask directly.
Surprisingly, this leads to better generated code (e.g. Mandelbrot 7.1x on AVX
vs 5.8x before). It's not clear why the general code generation path for
break doesn't generate the equivalent code; this topic should be investigated
further. (Issue #277).
This allows adding types to the list that are included in the automatically-generated
header files.
struct Foo { . . . };
struct Bar { . . . };
export { Foo, Bar };
Previously, we were trying to take a uniform seed and then shuffle that
around to initialize the state for each of the program instances. This
was becoming increasingly untenable and brittle.
Now a varying seed is expected and used.
There were a number of situations where we were left-shifting 1 by a
lane index that were failing due to shifting beyond 32-bits. Fixed
by shifting the 64-bit constant value 1ull.
It can sometimes be useful to know the general place we were in the program
when an assertion hit; when the position is available / applicable, this
macro is now used.
Issue #268.
Rather than XOR'ing with a temporary 'all-on' vector, we call
__not. Also, we call out to __and_not1 and __and_not2, for an
AND where the first or second operand, respectively, has had
NOT applied to it.
The intent of this was to indicate whether it was safe to run code
with an 'all of' mask on the given target (and then sometimes be
more flexible about e.g. running both true and false blocks of if
statements, etc.)
The problem is that even if the architecture has full native mask support,
it's still not safe to run 'uniform' memory operations with the mask all
off. Even more tricky, we sometimes transform masked varying memory operations
to uniform ones during optimization (e.g. gather->load and broadcast).
This fixes a number of the tests/switch-* tests that were failing on the
generic targets due to this issue.
In ee1fe3aa9f, the LLVM_VERSION define was updated to never
have the 'svn' suffix and the build was updated to handle LLVM
3.2. This file had a check for LLVM_3_1svn that was no longer
hitting.
This fixes some issues with unnecessary loads and stores
in generated C++ code for the generic targets.