Previously, when we had a switch statement with a uniform switch condition
but a 'break' statement that was under varying control flow inside the
switch, we'd promote the switch condition to be varying so that the
break would work correctly.
Now, we leave the condition as uniform and are thus able to use the
more-efficient LLVM switch instruction in this case.
Issue #156.
Specialize the code for the innermost loop to not do any masking
computations for the innermost dimension for the iterations where
we are certainly working on a full vector's worth of data.
This fix improves performance/code quality of "foreach" such that
it's essentially the same as the equivalent "for" loop.
Fixes issue #151.
Switches with both uniform and varying "switch" expressions are
supported. Switch statements with varying expressions and very
large numbers of labels may not perform well; some issues to be
filed shortly will track opportunities for improving these.
ispc now supports goto, but only under uniform control flow--i.e.
it must be possible for the compiler to statically determine that
all program instances will follow the goto. An error is issued at
compile time if a goto is used when this is not the case.
Specifically, we want to be able to late-bind on whether the mask is i32s or i1s, so if there's
any chance of ambiguity, we emit code that does the "GEP from a NULL base pointer" trick to
compute the value later in compilation.
When used, these targets end up with calls to undefined functions for all
of the various special vector stuff ispc needs to compile ispc programs
(masked store, gather, min/max, sqrt, etc.).
These targets are not yet useful for anything, but are a step toward
having an option to C++ code with calls out to intrinsics.
Reorganized the directory structure a bit and put the LLVM bitcode used
to define target-specific stuff (as well as some generic built-ins stuff)
into a builtins/ directory.
Note that for building on Windows, it's now necessary to set a LLVM_VERSION
environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)
Previously, we used an IfStmt to wrap complex functions with the equivalent
of a "cif" to check to see if the mask was all on, all off, or mixed at the
start of executing non-trivial functions. This had the unintended side
effect of suggesting to other parts of the compiler that the entire function
was under varying control flow (which in turn led to some small code
quality issues.)
Now, we emit the equivalent code directly.
The conceptual error was the assumption that not being under varying
control flow implied that the mask was all on; this is not the case
if some of the instances have executed a return earlier in the function's
execution. The error in practice would be that the mask would be
assumed to be all-on for things like memory writes, so there would
be unintended side-effects for the instances that had returned.
Previously, they all went into one big pile that was never cleaned up;
this was the wrong thing to do in a world where one might have a
function declaration inside another functions, say.
These make it easier to iterate over arbitrary amounts of data
elements; specifically, they automatically handle the "ragged
extra bits" that come up when the number of elements to be
processed isn't evenly divided by programCount.
TODO: documentation
The launch group handle is now reset to NULL after sync is called;
this ensures that if tasks are launched in the same function after
a sync, that the ISPCAlloc() call for the next launch will be
passed a NULL handle (as it should be).
Allow <, <=, >, >= comparisons of pointers
Allow explicit type-casting of pointers to and from integers
Fix bug in handling expressions of the form "int + ptr" ("ptr + int"
was fine).
Fix a bug in TypeCastExpr where varying -> uniform typecasts
would be allowed (leading to a crash later)
Pointers can be either uniform or varying, and behave correspondingly.
e.g.: "uniform float * varying" is a varying pointer to uniform float
data in memory, and "float * uniform" is a uniform pointer to varying
data in memory. Like other types, pointers are varying by default.
Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic,
and the array/pointer duality all bahave as in C. Array arguments
to functions are converted to pointers, also like C.
There is a built-in NULL for a null pointer value; conversion from
compile-time constant 0 values to NULL still needs to be implemented.
Other changes:
- Syntax for references has been updated to be C++ style; a useful
warning is now issued if the "reference" keyword is used.
- It is now illegal to pass a varying lvalue as a reference parameter
to a function; references are essentially uniform pointers.
This case had previously been handled via special case call by value
return code. That path has been removed, now that varying pointers
are available to handle this use case (and much more).
- Some stdlib routines have been updated to take pointers as
arguments where appropriate (e.g. prefetch and the atomics).
A number of others still need attention.
- All of the examples have been updated
- Many new tests
TODO: documentation
Previously, to compute the size of objects and the offsets of struct
elements within structs, we were using the trick of using getelementpointer
with a NULL base pointer and then casting the result to an int32/64.
However, since we actually know the target we're compiling for at
compile time, we can use corresponding methods from TargetData to
get these values directly.
This mostly cleans up code, but may make some of the gather/scatter
lowering to loads/stores optimizations work better in the presence
of structures.
Both uniform and varying function pointers are supported; when a function
is called through a varying function pointer, each unique function pointer
value across the running program instances is called once for the set of
active program instances that want to call it.
Previously, it was only in the GatherScatterFlattenOpt optimization pass that
we added the per-lane offsets when we were indexing into varying data.
(Specifically, the case of float foo[]; int index; foo[index], where foo
is an array of varying elements rather than uniform elements.) Now, this
is done in the front-end as we're first emitting code.
In addition to the basic ugliness of doing this in an optimization pass,
it was also error-prone to do it there, since we no longer have access
to all of the type information that's around in the front-end.
No functionality or performance change.
The Expr::TypeConv() method has been replaced with both a
CanConvertTypes() routine that indicates whether one type
can be converted to another and a TypeConvertExpr()
routine that provides the same functionality as
Expr::TypeConv() used to.
Specifically, we had been using the full mask for all gathers, rather than
using the internal mask when we were loading from locally-declared arrays.
Thus, given code like:
uniform float x[programCount] = { .. . };
float xx = x[programIndex];
Previously we weren't generating a plain vector load to initialize xx, when
this code was in a function where it wasn't known that the mask was all on,
even though it should have. Now it does.
We now maintain a the distinction between the value of the mask passed into a
function and the "internal" mask within the function that only accounts for
varying control flow within the function.
The full mask (the AND of the function mask and the internal mask) must be used
for assignments to static and global variables, and reference function parameters.
Further, it is the appropriate mask to use for making decisions about varying
control flow. However, we can use the internal mask for assignments to variables
declared in the current function (including the return value and non-reference
parameters to the function). Doing so allows us to catch a few more cases where
the internal mask is all on, even if the mask coming into the function wasn't all
on, and thence use moves rather than blends for those assignments. (Which in
turn can allow additional optimizations to happen.)
Fixes issue #23.
Added AST and Function classes.
Now, we parse the whole file and build up the AST for all of the
functions in the Module before we emit IR for the functions (vs. before,
when we generated IR along the way as we parsed the source file.)
Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)
Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API. (The updated API is also documented in
the ispc user's guide.)
As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.
This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
Previously, we did a vector equal compare and then a movmsk, the
result of which we checked to see if it was on for all lanes.
Because masks are vectors of i32s, under AVX, the vector equal
compare required two 4-wide SSE compares and some shuffling.
Now, we do a movmsk of both masks first and then a scalar
equality comparison of those two values, which seems to generate
overall better code.
Fixes bug #55. A number of tests were crashing on Windows due to the task
launch code using alloca to allocate space for the tasks' parameters. On
Windows, the stack isn't generally big enough for this to be a good idea.
Also added an alignment parmaeter to ISPCMalloc() to pass the alignment
requirement along.
When creating function Symbols for functions that were defined in LLVM bitcode for the standard library, if any of the function parameters are integer types, create two ispc-side Symbols: one where the integer types are all signed and the other where they are all unsigned. This allows us to provide, for example, both store_to_int16(reference int a[], uniform int offset, int val) as well as store_to_int16(reference unsigned int a[], uniform int offset, unsigned int val). functions.
Added some additional tests to exercise the new variants of these.
Also fixed some cases where the __{load,store}_int{8,16} builtins would read from/write to memory even if the mask was all off (which could cause crashes in some cases.)