For the case where we have a regular (i.e. non-'cif') 'if' statement,
the generated code just simply checks to see if any program instance
is running before running the corresponding statements. This is a
lighter-weight check than IfStmt::emitMaskMixed() was performing.
Contrary to claims in 0c2048385, that checkin didn't include the changes
to not run if/else blocks if none of the program instances wanted to be
running them. This checkin fixes that and thus actually fixes issue #74.
Using blend to do masked stores is unsafe if all of the lanes are off:
it may read from or write to invalid memory. For now, this workaround
transforms all 'if' statements into coherent 'if's, ensuring that an
instruction only runs if at least on program instance wants to be running
it.
One nice thing about this change is that a number of implementations of
various builtins can be simplified, since they no longer need to confirm
that at least one program instance is running.
It might be nice to re-enable regular if statements in a future checkin,
but we'd want to make sure they don't have any masked loads or blended
masked stores in their statement lists. There isn't a performance
impact for any of the examples with this change, so it's unclear if
this is important.
Note that this only impacts 'if' statements with a varying condition.
- Renamed stdlib-sse.ll to builtins-sse.ll (etc.) in an attempt to better indicate
the fact that the stuff in those files has a role beyond implementing stuff for
the standard library.
- Moved declarations of the various __pseudo_* functions from being done with LLVM
API calls in builtins.cpp to just straight up declarations in LLVM assembly
language in builtins.m4. (Much less code to do it this way, and more clear what's
going on.)
scalar values (that ispc used to smear across the array/struct
elements). Now, initializers in variable declarations must be
{ }-delimited lists, with one element per struct member or array
element, respectively.
There were a few problems with the previous implementation of the
functionality to initialize from scalars. First, the expression
would be evaluated once per value initialized, so if it had side-effects,
the wrong thing would happen. Next, for large multidimensional arrays,
the generated code would be a long series of move instructions, rather
than loops (and this in turn made LLVM take a long time.)
While both of these problems are fixable, it's a non-trivial
amount of re-plumbing for a questionable feature anyway.
Fixes issue #50.