Switches with both uniform and varying "switch" expressions are
supported. Switch statements with varying expressions and very
large numbers of labels may not perform well; some issues to be
filed shortly will track opportunities for improving these.
ispc now supports goto, but only under uniform control flow--i.e.
it must be possible for the compiler to statically determine that
all program instances will follow the goto. An error is issued at
compile time if a goto is used when this is not the case.
Now we require that the struct name match for two struct types to be the same.
Added a test to check this.
(Also removed a stale test, movmsk-opt.ispc)
These make it easier to iterate over arbitrary amounts of data
elements; specifically, they automatically handle the "ragged
extra bits" that come up when the number of elements to be
processed isn't evenly divided by programCount.
TODO: documentation
The packed_{load,store}_active now functions take a pointer to a
location at which to start loading/storing, rather than an array
base and a uniform index.
Variants of the prefetch functions that take varying pointers
are now available.
There are now variants of the various atomic functions that take
varying pointers (issue #112).
Allow <, <=, >, >= comparisons of pointers
Allow explicit type-casting of pointers to and from integers
Fix bug in handling expressions of the form "int + ptr" ("ptr + int"
was fine).
Fix a bug in TypeCastExpr where varying -> uniform typecasts
would be allowed (leading to a crash later)
Allow atomic types to be initialized with single-element expression lists:
int x = { 5 };
Issue an error if a storage class is provided with a function parameter.
Issue an error if two members of a struct have the same name.
Issue an error on trying to assign to a struct with a const member, even if
the struct itself isn't const.
Issue an error if a function is redefined.
Issue an error if a function overload is declared that differs only in return
type from a previously-declared function.
Issue an error if "inline" or "task" qualifiers are used outside of function
declarations.
Allow trailing ',' at the end of enumerator lists.
Multiple tests for all of the above.
Pointers can be either uniform or varying, and behave correspondingly.
e.g.: "uniform float * varying" is a varying pointer to uniform float
data in memory, and "float * uniform" is a uniform pointer to varying
data in memory. Like other types, pointers are varying by default.
Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic,
and the array/pointer duality all bahave as in C. Array arguments
to functions are converted to pointers, also like C.
There is a built-in NULL for a null pointer value; conversion from
compile-time constant 0 values to NULL still needs to be implemented.
Other changes:
- Syntax for references has been updated to be C++ style; a useful
warning is now issued if the "reference" keyword is used.
- It is now illegal to pass a varying lvalue as a reference parameter
to a function; references are essentially uniform pointers.
This case had previously been handled via special case call by value
return code. That path has been removed, now that varying pointers
are available to handle this use case (and much more).
- Some stdlib routines have been updated to take pointers as
arguments where appropriate (e.g. prefetch and the atomics).
A number of others still need attention.
- All of the examples have been updated
- Many new tests
TODO: documentation
Added support for resolving dimensions of multi-dimensional unsized arrays
from their initializer exprerssions (previously, only the first dimension
would be resolved.)
Added checks to make sure that no unsized array dimensions remain after
doing this (except for the first dimensision of array parameters to
functions.)
Substantial improvements and generalizations to the parsing and
declaration handling code to properly parse declarations involving
pointers. (No change to user-visible functionality, but this
lays groundwork for supporting a more general pointer model.)
Previously, it was only in the GatherScatterFlattenOpt optimization pass that
we added the per-lane offsets when we were indexing into varying data.
(Specifically, the case of float foo[]; int index; foo[index], where foo
is an array of varying elements rather than uniform elements.) Now, this
is done in the front-end as we're first emitting code.
In addition to the basic ugliness of doing this in an optimization pass,
it was also error-prone to do it there, since we no longer have access
to all of the type information that's around in the front-end.
No functionality or performance change.
In particular, this fixes issue #81, where a global variable access was leading to
ConstantExpressions showing up in this code, which it wasn't previously expecting.
Specifically, now we can work through phi nodes in the IR to detect cases
where an index value is actually the same across lanes or is linear across
the lanes. For example, this is a loop that used to require gathers but
is now turned into vector loads:
for (int i = programIndex; i < 16; i += programCount)
sum += a[i];
Fixes issue #107.
Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)
Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API. (The updated API is also documented in
the ispc user's guide.)
As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.
This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
Generalize the lScalarizeVector() utility routine (used in determining
when we can change gathers/scatters into vector loads/stores, respectively)
to handle vector shuffles and vector loads. This fixes issue #79, which
provided a case where a gather was being performed even though a vector
load was possible.
Fix RNG state initialization for 16-wide targets
Fix a number of bugs in reduce_add builtin implementations for AVX.
Fix some tests that had incorrect expected results for the 16-wide
case.
For associative atomic ops (add, and, or, xor), we can take advantage of
their associativity to do just a single hardware atomic instruction,
rather than one for each of the running program instances (as the previous
implementation did.)
The basic approach is to locally compute a reduction across the active
program instances with the given op and to then issue a single HW atomic
with that reduced value as the operand. We then take the old value that
was stored in the location that is returned from the HW atomic op and
use that to compute the values to return to each of the program instances
(conceptually representing the cumulative effect of each of the preceding
program instances having performed their atomic operation.)
Issue #56.
Compute a "local" min/max across the active program instances and
then do a single atomic memory op.
Added a few tests to exercise global min/max atomics (which were
previously untested!)
These get slightly wrong results for zero and the denorms and also
don't handle the Inf/NaN stuff correctly, but are much more efficient
than the full versions of these routines.