Rather than XOR'ing with a temporary 'all-on' vector, we call
__not. Also, we call out to __and_not1 and __and_not2, for an
AND where the first or second operand, respectively, has had
NOT applied to it.
The intent of this was to indicate whether it was safe to run code
with an 'all of' mask on the given target (and then sometimes be
more flexible about e.g. running both true and false blocks of if
statements, etc.)
The problem is that even if the architecture has full native mask support,
it's still not safe to run 'uniform' memory operations with the mask all
off. Even more tricky, we sometimes transform masked varying memory operations
to uniform ones during optimization (e.g. gather->load and broadcast).
This fixes a number of the tests/switch-* tests that were failing on the
generic targets due to this issue.
In ee1fe3aa9f, the LLVM_VERSION define was updated to never
have the 'svn' suffix and the build was updated to handle LLVM
3.2. This file had a check for LLVM_3_1svn that was no longer
hitting.
This fixes some issues with unnecessary loads and stores
in generated C++ code for the generic targets.
Now, the __smear* functions in generated C++ code have an unused first
parameter of the desired return type; this allows us to have headers
that include variants of __smear for multiple target widths. (This
approach is necessary since we can't overload by return type in C++.)
Issue #256.
Performance and code quality of performance suite is unchanged,
compilation times are improved by another 20% or so for simple
programs (e.g. rt.ispc). One very complex programs compiles
about 2.4x faster now.
Previously, GetElementType() would end up causing dynamic allocation to
happen to compute the final element type (turning types with unbound
variability into the same type with the struct's variability) each it was
called, which was wasteful and slow. Now we cache the result.
Another 20% perf on compiling that problematic program.
We don't need to explicitly create the non-const Types to do type
comparison when ignoring const-ness in the check.
We can also save some unnecessary dynamic memory allocation by
keeping strings returned from GetStructName() as references to strings.
This gives another 10% on front-end perf on that big program.
We now have a set of template functions CastType<AtomicType>, etc., that in
turn use a new typeId field in each Type instance, allowing them to be inlined
and to be quite efficient.
This improves front-end performance for a particular large program by 28%.
In one very large program, we were spending quite a bit of time repeatedly
getting const variants of StructTypes. This speeds up the front-end by
about 40% for that test case.
(This is something of a band-aid, pending uniquing types.)
Before, if the function was declared before being defined, then the symbol's
SourcePos would be left set to the position of the declaration. This ended
up getting the debugging symbols mixed up in this case, which was undesirable.
It's now possible to successfully print out the value of programIndex,
programCount, etc., in the debugger. The issue was that they were
defined as having InternalLinkage, which meant that DCE removed them
at the end of compilation. Now they're declared to have WeakODRLinkage,
which ensures that one copy survives (but there aren't multiply-defined
symbols when compiling multiple files.)