This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)
Updated generic-32.h and generic-64.h to the new memory API
(Rather than implicitly with a using declaration.) This will
allow for some further changes to ISPC's C backend, without collision
with ISPC's namespace. This change aims to have no effect on the code
generated by the compiler, it should be a big no-op; except for its
side-effects on maintainability.
This will allow for some further changes to ISPC's C backend, without collision with ISPC's namespace.
This change aims to have no effect on the code generated by the compiler, it should be a big no-op; except
for its side-effects on maintainability.
We need to do this since it's illegal to have nested foreach statements, but
nested foreach_unique, or foreach_unique inside foreach, etc., are all fine.
It's now legal to write:
struct Foo { Foo *next; };
previously, a predeclaration "struct Foo;" was required. This fixes
issue #287.
This change also fixes a bug where multiple forward declarations
"struct Foo; struct Foo;" would incorrectly issue an error on the
second one.
The string to be printed is accumulated into a local buffer before being sent to
puts(). This ensure that if multiple threads are running and printing at the
same time, their output won't be interleaved (across individual print statements--
it still may be interleaved across different print statements, just like in C).
Issue #293.
In particular, this gives us desired behavior for NaNs (all compares
involving a NaN evaluate to true). This in turn allows writing the
canonical isnan() function as "v != v".
Added isnan() to the standard library as well.
We now use InternalLinkage for the 'programIndex' symbol (and similar)
if we're not compiling with debugging symbols. This prevents those
symbol names/definitions from polluting the global namespace for
the common case.
Basically addresses Issue #274.
We should never be running with an all off mask and thus should never
enter a function with an all off mask. No performance change from
removing this, however.
Issue #282.
The "base+offsets" variants of gather decompose the integer offsets into
compile-time constant and compile-time unknown elements. (The coalescing
optimization, then, depends on this decomposition being done well--having
as much as possible in the constant component.) We now make multiple
efforts to improve this decomposition as we run optimization passes; in
some cases we're able to move more over to the constant side than was
first possible.
This in particular fixes issue #276, a case where coalescing was expected
but didn't actually happen.
Rather than having separate passes to do conversion, when possible, of:
- General gather/scatter of a vector of pointers to g/s of
a base pointer and integer offsets
- Gather/scatter to masked load/store, load+broadcast
- Masked load/store to regular load/store
Now all are done in a single ImproveMemoryOps pass. This change was in
particular to address some phase ordering issues that showed up with
multidimensional array access wherein after determining that an outer
dimension had the same index value, we previously weren't able to take
advantage of the uniformity of the resulting pointer.
Now that we never ever run with the mask all off, we no longer need
that logic in a built-in function so that we can check the mask. In
the one place where it was used (turning gathers to the same location
into a load and broadcast), we now just emit the code for that
directly.
When the outermost dimension(s) were partially active, but the innermost
dimension was all on, we'd inadvertently use an incorrect "all on"
execution mask.
Fixes issues #177 and #200.
Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then
use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have
separate float/double variants of each of those.
Change function suffix to "_i32", etc, from "_32"
Improve load_and_broadcast macro in util.m4 to grab vector width from
WIDTH variable rather than taking it as a parameter.