When "break", "continue", or "return" is used under varying control flow,
we now always check the execution mask to see if all of the program
instances are executing it. (Previously, this was only done with "cbreak",
"ccontinue", and "creturn", which are now deprecated.)
An important effect of this change is that it fixes a family of cases
where we could end up running with an "all off" execution mask, which isn't
supposed to happen, as it leads to all sorts of invalid behavior.
This change does cause the volume rendering example to run 9% slower, but
doesn't affect the other examples.
Issue #257.
This fixes a crash if 'cbreak' was used in a 'switch'. Renamed
FunctionEmitContext::SetLoopMask() to SetBlockEntryMask(), and
similarly the loopMask member variable.
Fixes to __load and __store.
Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext,
and __scatter_base_offsets32_float.
__rcp_varying_float now has a fast-math and full-precision implementation.
Some modules require an include of unistd.h (e.g. for getcwd and isatty
definitions).
These changes were required to build successfully on a Fedora 17 system,
using GCC 4.7.0 & glibc-headers 2.15.
If --opt=fast-math is used then the generated code contains:
#define ISPC_FAST_MATH 1
Otherwise it contains:
#undef ISPC_FAST_MATH
This allows the generic headers to support the user's request.
This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)
Updated generic-32.h and generic-64.h to the new memory API
(Rather than implicitly with a using declaration.) This will
allow for some further changes to ISPC's C backend, without collision
with ISPC's namespace. This change aims to have no effect on the code
generated by the compiler, it should be a big no-op; except for its
side-effects on maintainability.
We need to do this since it's illegal to have nested foreach statements, but
nested foreach_unique, or foreach_unique inside foreach, etc., are all fine.
It's now legal to write:
struct Foo { Foo *next; };
previously, a predeclaration "struct Foo;" was required. This fixes
issue #287.
This change also fixes a bug where multiple forward declarations
"struct Foo; struct Foo;" would incorrectly issue an error on the
second one.
The string to be printed is accumulated into a local buffer before being sent to
puts(). This ensure that if multiple threads are running and printing at the
same time, their output won't be interleaved (across individual print statements--
it still may be interleaved across different print statements, just like in C).
Issue #293.
In particular, this gives us desired behavior for NaNs (all compares
involving a NaN evaluate to true). This in turn allows writing the
canonical isnan() function as "v != v".
Added isnan() to the standard library as well.
We now use InternalLinkage for the 'programIndex' symbol (and similar)
if we're not compiling with debugging symbols. This prevents those
symbol names/definitions from polluting the global namespace for
the common case.
Basically addresses Issue #274.
We should never be running with an all off mask and thus should never
enter a function with an all off mask. No performance change from
removing this, however.
Issue #282.
The "base+offsets" variants of gather decompose the integer offsets into
compile-time constant and compile-time unknown elements. (The coalescing
optimization, then, depends on this decomposition being done well--having
as much as possible in the constant component.) We now make multiple
efforts to improve this decomposition as we run optimization passes; in
some cases we're able to move more over to the constant side than was
first possible.
This in particular fixes issue #276, a case where coalescing was expected
but didn't actually happen.
Rather than having separate passes to do conversion, when possible, of:
- General gather/scatter of a vector of pointers to g/s of
a base pointer and integer offsets
- Gather/scatter to masked load/store, load+broadcast
- Masked load/store to regular load/store
Now all are done in a single ImproveMemoryOps pass. This change was in
particular to address some phase ordering issues that showed up with
multidimensional array access wherein after determining that an outer
dimension had the same index value, we previously weren't able to take
advantage of the uniformity of the resulting pointer.