We now do a single atomic hardware swap and then effectively do
swaps between the running program instances such that the result
is the same as if they had happened to run a particular ordering
of hardware swaps themselves.
Also cleaned up __atomic_swap_uniform_* built-in implementations
to not take the mask, which they weren't using anyway.
Finishes Issue #56.
When used, these targets end up with calls to undefined functions for all
of the various special vector stuff ispc needs to compile ispc programs
(masked store, gather, min/max, sqrt, etc.).
These targets are not yet useful for anything, but are a step toward
having an option to C++ code with calls out to intrinsics.
Reorganized the directory structure a bit and put the LLVM bitcode used
to define target-specific stuff (as well as some generic built-ins stuff)
into a builtins/ directory.
Note that for building on Windows, it's now necessary to set a LLVM_VERSION
environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)
The packed_{load,store}_active now functions take a pointer to a
location at which to start loading/storing, rather than an array
base and a uniform index.
Variants of the prefetch functions that take varying pointers
are now available.
There are now variants of the various atomic functions that take
varying pointers (issue #112).
Pointers can be either uniform or varying, and behave correspondingly.
e.g.: "uniform float * varying" is a varying pointer to uniform float
data in memory, and "float * uniform" is a uniform pointer to varying
data in memory. Like other types, pointers are varying by default.
Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic,
and the array/pointer duality all bahave as in C. Array arguments
to functions are converted to pointers, also like C.
There is a built-in NULL for a null pointer value; conversion from
compile-time constant 0 values to NULL still needs to be implemented.
Other changes:
- Syntax for references has been updated to be C++ style; a useful
warning is now issued if the "reference" keyword is used.
- It is now illegal to pass a varying lvalue as a reference parameter
to a function; references are essentially uniform pointers.
This case had previously been handled via special case call by value
return code. That path has been removed, now that varying pointers
are available to handle this use case (and much more).
- Some stdlib routines have been updated to take pointers as
arguments where appropriate (e.g. prefetch and the atomics).
A number of others still need attention.
- All of the examples have been updated
- Many new tests
TODO: documentation
The intent is that the code in stdlib.ispc that is calling out to the built-ins
should match argument types exactly (using explicit casts as needed), just
for maximal clarity/safety.
Fix RNG state initialization for 16-wide targets
Fix a number of bugs in reduce_add builtin implementations for AVX.
Fix some tests that had incorrect expected results for the 16-wide
case.
For associative atomic ops (add, and, or, xor), we can take advantage of
their associativity to do just a single hardware atomic instruction,
rather than one for each of the running program instances (as the previous
implementation did.)
The basic approach is to locally compute a reduction across the active
program instances with the given op and to then issue a single HW atomic
with that reduced value as the operand. We then take the old value that
was stored in the location that is returned from the HW atomic op and
use that to compute the values to return to each of the program instances
(conceptually representing the cumulative effect of each of the preceding
program instances having performed their atomic operation.)
Issue #56.
Compute a "local" min/max across the active program instances and
then do a single atomic memory op.
Added a few tests to exercise global min/max atomics (which were
previously untested!)
These get slightly wrong results for zero and the denorms and also
don't handle the Inf/NaN stuff correctly, but are much more efficient
than the full versions of these routines.
This way, we match C/C++ in that casting a bool to an int gives either the value
zero or the value one. There is a new stdlib function int sign_extend(bool)
that does sign extension for cases where that's desired.
Add much more suppport for doubles and in64 types in the standard library, basically supporting everything for them that are supported for floats and int32s. (The notable exceptions being the approximate rcp() and rsqrt() functions, which don't really have sensible analogs for doubles (or at least not built-in instructions).)
This checkin provides the standard set of atomic operations and a memory barrier in the ispc standard library. Both signed and unsigned 32- and 64-bit integer types are supported.
When creating function Symbols for functions that were defined in LLVM bitcode for the standard library, if any of the function parameters are integer types, create two ispc-side Symbols: one where the integer types are all signed and the other where they are all unsigned. This allows us to provide, for example, both store_to_int16(reference int a[], uniform int offset, int val) as well as store_to_int16(reference unsigned int a[], uniform int offset, unsigned int val). functions.
Added some additional tests to exercise the new variants of these.
Also fixed some cases where the __{load,store}_int{8,16} builtins would read from/write to memory even if the mask was all off (which could cause crashes in some cases.)