Generalize the lScalarizeVector() utility routine (used in determining
when we can change gathers/scatters into vector loads/stores, respectively)
to handle vector shuffles and vector loads. This fixes issue #79, which
provided a case where a gather was being performed even though a vector
load was possible.
Given the change in 0c20483853, this is no longer necessary, since
we know that one instance will always be running if we're executing a
given block of code.
Using blend to do masked stores is unsafe if all of the lanes are off:
it may read from or write to invalid memory. For now, this workaround
transforms all 'if' statements into coherent 'if's, ensuring that an
instruction only runs if at least on program instance wants to be running
it.
One nice thing about this change is that a number of implementations of
various builtins can be simplified, since they no longer need to confirm
that at least one program instance is running.
It might be nice to re-enable regular if statements in a future checkin,
but we'd want to make sure they don't have any masked loads or blended
masked stores in their statement lists. There isn't a performance
impact for any of the examples with this change, so it's unclear if
this is important.
Note that this only impacts 'if' statements with a varying condition.
This applies a floating-point scale factor to the image resolution;
it's useful for experiments with many-core systems where the
base image resolution may not give enough work for good load-balancing
with tasks.
All of the masked store calls were inhibiting putting values into
registers, which in turn led to a lot of unnecessary stack traffic.
This approach seems to give better code in the end.
Fix RNG state initialization for 16-wide targets
Fix a number of bugs in reduce_add builtin implementations for AVX.
Fix some tests that had incorrect expected results for the 16-wide
case.
fp constant undesirably causing computation to be done in double precision.
Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster.
Makes scalar version of noise about 15% faster.
Others are unchanged.
For associative atomic ops (add, and, or, xor), we can take advantage of
their associativity to do just a single hardware atomic instruction,
rather than one for each of the running program instances (as the previous
implementation did.)
The basic approach is to locally compute a reduction across the active
program instances with the given op and to then issue a single HW atomic
with that reduced value as the operand. We then take the old value that
was stored in the location that is returned from the HW atomic op and
use that to compute the values to return to each of the program instances
(conceptually representing the cumulative effect of each of the preceding
program instances having performed their atomic operation.)
Issue #56.
When replacing 'all on' masked store with regular store, set alignment
to be the vector element alignment, not the alignment for a whole vector.
(i.e. 4 or 8 byte alignment, not 32 or 64).
Old run_tests.sh still lives (for now).
Changes include:
- Tests are run in parallel across all of the available CPU cores
- Option to create a statically-linked executable for each test
(rather than using the LLVM JIT). This is in particular useful
for AVX, which doesn't have good JIT support yet.
- Static executables also makes it possible to test x86, not
just x86-64, codegen.
- Fixed a number of tests in failing_tests, which were actually
failing due to the fact that the expected function signature of
tests had changed.
Emit calls to masked_store, not masked_store_blend, when handling
masked stores emitted by the frontend.
Fix bug in binary8to16 macro in builtins.m4
Fix bug in 16-wide version of __reduce_add_float
Remove blend function implementations for masked_store_blend for
AVX; just forward those on to the corresponding real masked store
functions.
Compute a "local" min/max across the active program instances and
then do a single atomic memory op.
Added a few tests to exercise global min/max atomics (which were
previously untested!)