Previously, we were trying to take a uniform seed and then shuffle that
around to initialize the state for each of the program instances. This
was becoming increasingly untenable and brittle.
Now a varying seed is expected and used.
There were a number of situations where we were left-shifting 1 by a
lane index that were failing due to shifting beyond 32-bits. Fixed
by shifting the 64-bit constant value 1ull.
It can sometimes be useful to know the general place we were in the program
when an assertion hit; when the position is available / applicable, this
macro is now used.
Issue #268.
Rather than XOR'ing with a temporary 'all-on' vector, we call
__not. Also, we call out to __and_not1 and __and_not2, for an
AND where the first or second operand, respectively, has had
NOT applied to it.
The intent of this was to indicate whether it was safe to run code
with an 'all of' mask on the given target (and then sometimes be
more flexible about e.g. running both true and false blocks of if
statements, etc.)
The problem is that even if the architecture has full native mask support,
it's still not safe to run 'uniform' memory operations with the mask all
off. Even more tricky, we sometimes transform masked varying memory operations
to uniform ones during optimization (e.g. gather->load and broadcast).
This fixes a number of the tests/switch-* tests that were failing on the
generic targets due to this issue.
In ee1fe3aa9f, the LLVM_VERSION define was updated to never
have the 'svn' suffix and the build was updated to handle LLVM
3.2. This file had a check for LLVM_3_1svn that was no longer
hitting.
This fixes some issues with unnecessary loads and stores
in generated C++ code for the generic targets.
Now, the __smear* functions in generated C++ code have an unused first
parameter of the desired return type; this allows us to have headers
that include variants of __smear for multiple target widths. (This
approach is necessary since we can't overload by return type in C++.)
Issue #256.
Performance and code quality of performance suite is unchanged,
compilation times are improved by another 20% or so for simple
programs (e.g. rt.ispc). One very complex programs compiles
about 2.4x faster now.