- Don't suggest matches when given an empty string or a single, non-alpha
character.
- Also fixed the parser to be a bit less confusing when it encounters an
unexpected EOF.
Go back to running both sides of 'if' statements with masking and without
branching if we can determine that the code is relatively simple (as per
the simple cost model), and is safe to run even if the mask is 'all off'.
This gives a bit of a performance improvement for some of the examples
(most notably, the ray tracer), and is the code that one wants generated
in this case anyhow.
This is currently only used to decide whether it's worth doing an
"are all lanes running" check at the start of functions--for small
functions, it's not worth the overhead.
The cost is estimated relatively early in compilation (e.g. before
we know if an array access is a scatter/gather or not, before
constant folding, etc.), so there are many known shortcomings.
For the case where we have a regular (i.e. non-'cif') 'if' statement,
the generated code just simply checks to see if any program instance
is running before running the corresponding statements. This is a
lighter-weight check than IfStmt::emitMaskMixed() was performing.
Previously, we did a vector equal compare and then a movmsk, the
result of which we checked to see if it was on for all lanes.
Because masks are vectors of i32s, under AVX, the vector equal
compare required two 4-wide SSE compares and some shuffling.
Now, we do a movmsk of both masks first and then a scalar
equality comparison of those two values, which seems to generate
overall better code.
Contrary to claims in 0c2048385, that checkin didn't include the changes
to not run if/else blocks if none of the program instances wanted to be
running them. This checkin fixes that and thus actually fixes issue #74.
Generalize the lScalarizeVector() utility routine (used in determining
when we can change gathers/scatters into vector loads/stores, respectively)
to handle vector shuffles and vector loads. This fixes issue #79, which
provided a case where a gather was being performed even though a vector
load was possible.
Given the change in 0c20483853, this is no longer necessary, since
we know that one instance will always be running if we're executing a
given block of code.
Using blend to do masked stores is unsafe if all of the lanes are off:
it may read from or write to invalid memory. For now, this workaround
transforms all 'if' statements into coherent 'if's, ensuring that an
instruction only runs if at least on program instance wants to be running
it.
One nice thing about this change is that a number of implementations of
various builtins can be simplified, since they no longer need to confirm
that at least one program instance is running.
It might be nice to re-enable regular if statements in a future checkin,
but we'd want to make sure they don't have any masked loads or blended
masked stores in their statement lists. There isn't a performance
impact for any of the examples with this change, so it's unclear if
this is important.
Note that this only impacts 'if' statements with a varying condition.
This applies a floating-point scale factor to the image resolution;
it's useful for experiments with many-core systems where the
base image resolution may not give enough work for good load-balancing
with tasks.
All of the masked store calls were inhibiting putting values into
registers, which in turn led to a lot of unnecessary stack traffic.
This approach seems to give better code in the end.
Fix RNG state initialization for 16-wide targets
Fix a number of bugs in reduce_add builtin implementations for AVX.
Fix some tests that had incorrect expected results for the 16-wide
case.
fp constant undesirably causing computation to be done in double precision.
Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster.
Makes scalar version of noise about 15% faster.
Others are unchanged.