This pass handles the "all on" and "all off" mask cases appropriately.
Also renamed load_masked stuff in built-ins to masked_load for consistency with
masked_store.
Now we require that the struct name match for two struct types to be the same.
Added a test to check this.
(Also removed a stale test, movmsk-opt.ispc)
When used, these targets end up with calls to undefined functions for all
of the various special vector stuff ispc needs to compile ispc programs
(masked store, gather, min/max, sqrt, etc.).
These targets are not yet useful for anything, but are a step toward
having an option to C++ code with calls out to intrinsics.
Reorganized the directory structure a bit and put the LLVM bitcode used
to define target-specific stuff (as well as some generic built-ins stuff)
into a builtins/ directory.
Note that for building on Windows, it's now necessary to set a LLVM_VERSION
environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)
When loading from an address that's computed by adding two registers
together, x86 can scale one of them by 2, 4, or 8, for free as part
of the addressing calculation. This change makes the code generated
for gather and scatter use this.
For the cases where gather/scatter is based on a base pointer and
an integer offset vector, the GSImprovementsPass looks to see if the
integer offsets are being computed as 2/4/8 times some other value.
If so, it extracts the 2x/4x/8x part and leaves the rest there as
the the offsets. The {gather,scatter}_base_offsets_* functions take
an i32 scale factor, which is passed to them, and then they carefully
generate IR so that it hits LLVM's pattern matching for these scales.
This is particular win on AVX, since it saves us two 4-wide integer
multiplies.
Noise runs 14% faster with this.
Issue #132.
Specifically, stmts and exprs are no longer responsible for first recursively
optimizing their children before doing their own optimization (this turned
out to be error-prone, with children sometimes being forgotten.) They now
are just responsible for their own optimization, when appropriate.
In general, it should just return the original node pointer, but for type checking
and optimization passes, it can return a new value for the node (that will be
assigned where the old one was in the tree.)
Along the way, fixed some bugs in WalkAST() where the postorder callback wouldn't
end up being called for a few expr types (sizeof, dereference, address of,
reference).
For starters, use it for the check to see if code is safe to run with the
mask all off.
This also fixes a bug where we would sometimes incorrectly say that
a whole block of code was unsafe to run with an all off mask because we came
to a NULL AST node during traversal.
Accept 'u' and 'l' suffixes to force the constants to be corresponding types.
Just carry around a single 64-bit int value in yylval rather than having both
32- and 64-bit variants.
Previously, we used an IfStmt to wrap complex functions with the equivalent
of a "cif" to check to see if the mask was all on, all off, or mixed at the
start of executing non-trivial functions. This had the unintended side
effect of suggesting to other parts of the compiler that the entire function
was under varying control flow (which in turn led to some small code
quality issues.)
Now, we emit the equivalent code directly.
We should always use the full mask when storing to a reference, since we
don't in general know what it refers to (and thence the appropriate mask
to use for its target).
The conceptual error was the assumption that not being under varying
control flow implied that the mask was all on; this is not the case
if some of the instances have executed a return earlier in the function's
execution. The error in practice would be that the mask would be
assumed to be all-on for things like memory writes, so there would
be unintended side-effects for the instances that had returned.
(Somehow this wasn't being done before.)
Errors are now issued if too few arguments are used when calling through
a function pointer, too many arguments are used, or if any of them can't be
type converted to the parameter type.
Specifically, we weren't storing the results passed back from when we called
those methods of the start and end exprs. This manifested itself as overloaded
functions there not resolving properly.
Fixes issue #131; because they weren't being marked as internal before, when
compiling to multiple targets these would lead to multiply-defined symbols.