Add support for "local" atomics.
Also updated aobench example to use them, which in turn allows using foreach() and thence a much cleaner implementation. Issue #58.
This commit is contained in:
188
docs/ispc.rst
188
docs/ispc.rst
@@ -3389,24 +3389,53 @@ Systems Programming Support
|
|||||||
Atomic Operations and Memory Fences
|
Atomic Operations and Memory Fences
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
|
|
||||||
The usual range of atomic memory operations are provided in ``ispc``,
|
The standard range of atomic memory operations are provided by the standard
|
||||||
including variants to handle both uniform and varying types. As a first
|
library``ispc``, including variants to handle both uniform and varying
|
||||||
example, consider on variant of the 32-bit integer atomic add routine:
|
types as well as "local" and "global" atomics.
|
||||||
|
|
||||||
|
Local atomics provide atomic behavior across the program instances in a
|
||||||
|
gang, but not across multiple gangs or memory operations in different
|
||||||
|
hardware threads. To see why they are needed, consider a histogram
|
||||||
|
calculation where each program instance in the gang computes which bucket a
|
||||||
|
value lies in and then increments a corresponding counter. If the code is
|
||||||
|
written like this:
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
int32 atomic_add_global(uniform int32 * uniform ptr, int32 delta)
|
uniform int count[N_BUCKETS] = ...;
|
||||||
|
float value = ...;
|
||||||
|
int bucket = clamp(value / N_BUCKETS, 0, N_BUCKETS);
|
||||||
|
++count[bucket]; // ERROR: undefined behavior if collisions
|
||||||
|
|
||||||
The semantics are the expected ones for an atomic add function: the pointer
|
then the program's behavior is undefined: whenever multiple program
|
||||||
points to a single location in memory (the same one for all program
|
instances have values that map to the same value of ``bucket``, then the
|
||||||
instances), and for each executing program instance, the value stored in
|
effect of the increment is undefined. (See the discussion in the `Data
|
||||||
the location that ``ptr`` points to has that program instance's value
|
Races Within a Gang`_ section; in the case here, there isn't a sequence
|
||||||
"delta" added to it atomically, and the old value at that location is
|
point between one program instance updating ``count[bucket]`` and the other
|
||||||
returned from the function. (Thus, if multiple processors simultaneously
|
program instance reading its value.)
|
||||||
issue atomic adds to the same memory location, the adds will be serialized
|
|
||||||
by the hardware so that the correct result is computed in the end.
|
The ``atomic_add_local()`` function can be used in this case; as a local
|
||||||
Furthermore, the atomic adds are serialized across the running program
|
atomic it is atomic across the gang of program instances, such that the
|
||||||
instances.)
|
expected result is computed.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
...
|
||||||
|
int bucket = clamp(value / N_BUCKETS, 0, N_BUCKETS);
|
||||||
|
atomic_add_local(&count[bucket], 1);
|
||||||
|
|
||||||
|
It uses this variant of the 32-bit integer atomic add routine:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
int32 atomic_add_local(uniform int32 * uniform ptr, int32 delta)
|
||||||
|
|
||||||
|
The semantics of this routine are typical for an atomic add function: the
|
||||||
|
pointer here points to a single location in memory (the same one for all
|
||||||
|
program instances), and for each executing program instance, the value
|
||||||
|
stored in the location that ``ptr`` points to has that program instance's
|
||||||
|
value "delta" added to it atomically, and the old value at that location is
|
||||||
|
returned from the function.
|
||||||
|
|
||||||
One thing to note is that that the type of the value being added to a
|
One thing to note is that that the type of the value being added to a
|
||||||
``uniform`` integer, while the increment amount and the return value are
|
``uniform`` integer, while the increment amount and the return value are
|
||||||
@@ -3417,45 +3446,76 @@ atomics for the running program instances may be issued in arbitrary order;
|
|||||||
it's not guaranteed that they will be issued in ``programIndex`` order, for
|
it's not guaranteed that they will be issued in ``programIndex`` order, for
|
||||||
example.
|
example.
|
||||||
|
|
||||||
Here are the declarations of the ``int32`` variants of these functions.
|
Global atomics are more powerful than local atomics; they are atomic across
|
||||||
There are also ``int64`` equivalents as well as variants that take
|
both the program instances in the gang as well as atomic across different
|
||||||
``unsigned`` ``int32`` and ``int64`` values. (The ``atomic_swap_global()``
|
gangs and different hardware threads. For example, for the global variant
|
||||||
function can be used with ``float`` and ``double`` types as well.)
|
of the atomic used above,
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
int32 atomic_add_global(uniform int32 * uniform ptr, int32 value)
|
int32 atomic_add_global(uniform int32 * uniform ptr, int32 delta)
|
||||||
int32 atomic_subtract_global(uniform int32 * uniform ptr, int32 value)
|
|
||||||
int32 atomic_min_global(uniform int32 * uniform ptr, int32 value)
|
|
||||||
int32 atomic_max_global(uniform int32 * uniform ptr, int32 value)
|
|
||||||
int32 atomic_and_global(uniform int32 * uniform ptr, int32 value)
|
|
||||||
int32 atomic_or_global(uniform int32 * uniform ptr, int32 value)
|
|
||||||
int32 atomic_xor_global(uniform int32 * uniform ptr, int32 value)
|
|
||||||
int32 atomic_swap_global(uniform int32 * uniform ptr, int32 value)
|
|
||||||
|
|
||||||
There are also variants of these functions that take ``uniform`` values for
|
if multiple processors simultaneously issue atomic adds to the same memory
|
||||||
the operand and return a ``uniform`` result. These correspond to a single
|
location, the adds will be serialized by the hardware so that the correct
|
||||||
|
result is computed in the end.
|
||||||
|
|
||||||
|
Here are the declarations of the ``int32`` variants of these functions.
|
||||||
|
There are also ``int64`` equivalents as well as variants that take
|
||||||
|
``unsigned`` ``int32`` and ``int64`` values.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
int32 atomic_add_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
int32 atomic_subtract_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
int32 atomic_min_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
int32 atomic_max_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
int32 atomic_and_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
int32 atomic_or_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
int32 atomic_xor_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
int32 atomic_swap_{local,global}(uniform int32 * uniform ptr, int32 value)
|
||||||
|
|
||||||
|
Support for ``float`` and ``double`` types is also available. For local
|
||||||
|
atomics, all but the logical operations are available. (There are
|
||||||
|
corresponding ``double`` variants of these, not listed here.)
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
float atomic_add_local(uniform float * uniform ptr, float value)
|
||||||
|
float atomic_subtract_local(uniform float * uniform ptr, float value)
|
||||||
|
float atomic_min_local(uniform float * uniform ptr, float value)
|
||||||
|
float atomic_max_local(uniform float * uniform ptr, float value)
|
||||||
|
float atomic_swap_local(uniform float * uniform ptr, float value)
|
||||||
|
|
||||||
|
For global atomics, only atomic swap is available for these types:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
float atomic_swap_global(uniform float * uniform ptr, float value)
|
||||||
|
double atomic_swap_global(uniform double * uniform ptr, double value)
|
||||||
|
|
||||||
|
There are also variants of the atomic that take ``uniform`` values for the
|
||||||
|
operand and return a ``uniform`` result. These correspond to a single
|
||||||
atomic operation being performed for the entire gang of program instances,
|
atomic operation being performed for the entire gang of program instances,
|
||||||
rather than one per program instance.
|
rather than one per program instance.
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
uniform int32 atomic_add_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_add_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 value)
|
uniform int32 value)
|
||||||
uniform int32 atomic_subtract_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_subtract_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 value)
|
uniform int32 value)
|
||||||
uniform int32 atomic_min_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_min_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 value)
|
uniform int32 value)
|
||||||
uniform int32 atomic_max_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_max_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 value)
|
uniform int32 value)
|
||||||
uniform int32 atomic_and_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_and_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 value)
|
uniform int32 value)
|
||||||
uniform int32 atomic_or_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_or_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 value)
|
uniform int32 value)
|
||||||
uniform int32 atomic_xor_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_xor_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 value)
|
uniform int32 value)
|
||||||
uniform int32 atomic_swap_global(uniform int32 * uniform ptr,
|
uniform int32 atomic_swap_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 newval)
|
uniform int32 newval)
|
||||||
|
|
||||||
Be careful that you use the atomic function that you mean to; consider the
|
Be careful that you use the atomic function that you mean to; consider the
|
||||||
following code:
|
following code:
|
||||||
@@ -3479,8 +3539,7 @@ will cause the desired atomic add function to be called.
|
|||||||
::
|
::
|
||||||
|
|
||||||
extern uniform int32 counter;
|
extern uniform int32 counter;
|
||||||
int32 one = 1;
|
int32 myCounter = atomic_add_global(&counter, (varying int32)1);
|
||||||
int32 myCounter = atomic_add_global(&counter, one);
|
|
||||||
|
|
||||||
There is a third variant of each of these atomic functions that takes a
|
There is a third variant of each of these atomic functions that takes a
|
||||||
``varying`` pointer; this allows each program instance to issue an atomic
|
``varying`` pointer; this allows each program instance to issue an atomic
|
||||||
@@ -3490,30 +3549,27 @@ the same location in memory!)
|
|||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
int32 atomic_add_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_add_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
int32 atomic_subtract_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_subtract_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
int32 atomic_min_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_min_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
int32 atomic_max_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_max_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
int32 atomic_and_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_and_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
int32 atomic_or_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_or_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
int32 atomic_xor_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_xor_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
int32 atomic_swap_global(uniform int32 * varying ptr, int32 value)
|
int32 atomic_swap_{local,global}(uniform int32 * varying ptr, int32 value)
|
||||||
|
|
||||||
There are also atomic swap and "compare and exchange" functions.
|
There are also atomic "compare and exchange" functions. Compare and
|
||||||
Compare and exchange atomically compares the value in "val" to
|
exchange atomically compares the value in "val" to "compare"--if they
|
||||||
"compare"--if they match, it assigns "newval" to "val". In either case,
|
match, it assigns "newval" to "val". In either case, the old value of
|
||||||
the old value of "val" is returned. (As with the other atomic operations,
|
"val" is returned. (As with the other atomic operations, there are also
|
||||||
there are also ``unsigned`` and 64-bit variants of this function.
|
``unsigned`` and 64-bit variants of this function. Furthermore, there are
|
||||||
Furthermore, there are ``float`` and ``double`` variants as well.)
|
``float`` and ``double`` variants as well.)
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
int32 atomic_swap_global(uniform int32 * uniform ptr, int32 newvalue)
|
int32 atomic_compare_exchange_{local,global}(uniform int32 * uniform ptr,
|
||||||
uniform int32 atomic_swap_global(uniform int32 * uniform ptr,
|
int32 compare, int32 newval)
|
||||||
uniform int32 newvalue)
|
uniform int32 atomic_compare_exchange_{local,global}(uniform int32 * uniform ptr,
|
||||||
int32 atomic_compare_exchange_global(uniform int32 * uniform ptr,
|
|
||||||
int32 compare, int32 newval)
|
|
||||||
uniform int32 atomic_compare_exchange_global(uniform int32 * uniform ptr,
|
|
||||||
uniform int32 compare, uniform int32 newval)
|
uniform int32 compare, uniform int32 newval)
|
||||||
|
|
||||||
``ispc`` also has a standard library routine that inserts a memory barrier
|
``ispc`` also has a standard library routine that inserts a memory barrier
|
||||||
|
|||||||
@@ -212,104 +212,44 @@ static void ao_scanlines(uniform int y0, uniform int y1, uniform int w,
|
|||||||
RNGState rngstate;
|
RNGState rngstate;
|
||||||
|
|
||||||
seed_rng(&rngstate, y0);
|
seed_rng(&rngstate, y0);
|
||||||
|
float invSamples = 1.f / nsubsamples;
|
||||||
|
|
||||||
// Compute the mapping between the 'programCount'-wide program
|
foreach_tiled(y = y0 ... y1, x = 0 ... w,
|
||||||
// instances running in parallel and samples in the image.
|
u = 0 ... nsubsamples, v = 0 ... nsubsamples) {
|
||||||
//
|
float du = (float)u * invSamples, dv = (float)v * invSamples;
|
||||||
// For now, we'll always take four samples per pixel, so start by
|
|
||||||
// initializing du and dv with offsets into subpixel samples. We'll
|
|
||||||
// take care of further updating du and dv for the case where we're
|
|
||||||
// doing more than 4 program instances in parallel shortly.
|
|
||||||
uniform float uSteps[4] = { 0, 1, 0, 1 };
|
|
||||||
uniform float vSteps[4] = { 0, 0, 1, 1 };
|
|
||||||
float du = uSteps[programIndex % 4] / nsubsamples;
|
|
||||||
float dv = vSteps[programIndex % 4] / nsubsamples;
|
|
||||||
|
|
||||||
// Now handle the case where we are able to do more than one pixel's
|
// Figure out x,y pixel in NDC
|
||||||
// worth of work at once. nx records the number of pixels in the x
|
float px = (x + du - (w / 2.0f)) / (w / 2.0f);
|
||||||
// direction we do per iteration and ny the number in y.
|
float py = -(y + dv - (h / 2.0f)) / (h / 2.0f);
|
||||||
uniform int nx = 1, ny = 1;
|
float ret = 0.f;
|
||||||
|
Ray ray;
|
||||||
|
Isect isect;
|
||||||
|
|
||||||
// FIXME: We actually need ny to be 1 regardless of the decomposition,
|
ray.org = 0.f;
|
||||||
// since the task decomposition is one scanline high.
|
|
||||||
|
|
||||||
if (programCount == 8) {
|
// Poor man's perspective projection
|
||||||
// Do two pixels at once in the x direction
|
ray.dir.x = px;
|
||||||
nx = 2;
|
ray.dir.y = py;
|
||||||
if (programIndex >= 4)
|
ray.dir.z = -1.0;
|
||||||
// And shift the offsets for the second pixel's worth of work
|
vnormalize(ray.dir);
|
||||||
++du;
|
|
||||||
}
|
|
||||||
else if (programCount == 16) {
|
|
||||||
nx = 4;
|
|
||||||
ny = 1;
|
|
||||||
if (programIndex >= 4 && programIndex < 8)
|
|
||||||
++du;
|
|
||||||
if (programIndex >= 8 && programIndex < 12)
|
|
||||||
du += 2;
|
|
||||||
if (programIndex >= 12)
|
|
||||||
du += 3;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Now loop over all of the pixels, stepping in x and y as calculated
|
isect.t = 1.0e+17;
|
||||||
// above. (Assumes that ny divides y and nx divides x...)
|
isect.hit = 0;
|
||||||
for (uniform int y = y0; y < y1; y += ny) {
|
|
||||||
for (uniform int x = 0; x < w; x += nx) {
|
|
||||||
// Figure out x,y pixel in NDC
|
|
||||||
float px = (x + du - (w / 2.0f)) / (w / 2.0f);
|
|
||||||
float py = -(y + dv - (h / 2.0f)) / (h / 2.0f);
|
|
||||||
float ret = 0.f;
|
|
||||||
Ray ray;
|
|
||||||
Isect isect;
|
|
||||||
|
|
||||||
ray.org = 0.f;
|
for (uniform int snum = 0; snum < 3; ++snum)
|
||||||
|
ray_sphere_intersect(isect, ray, spheres[snum]);
|
||||||
|
ray_plane_intersect(isect, ray, plane);
|
||||||
|
|
||||||
// Poor man's perspective projection
|
// Note use of 'coherent' if statement; the set of rays we
|
||||||
ray.dir.x = px;
|
// trace will often all hit or all miss the scene
|
||||||
ray.dir.y = py;
|
cif (isect.hit) {
|
||||||
ray.dir.z = -1.0;
|
ret = ambient_occlusion(isect, plane, spheres, rngstate);
|
||||||
vnormalize(ray.dir);
|
ret *= invSamples * invSamples;
|
||||||
|
|
||||||
isect.t = 1.0e+17;
|
int offset = 3 * (y * w + x);
|
||||||
isect.hit = 0;
|
atomic_add_local(&image[offset], ret);
|
||||||
|
atomic_add_local(&image[offset+1], ret);
|
||||||
for (uniform int snum = 0; snum < 3; ++snum)
|
atomic_add_local(&image[offset+2], ret);
|
||||||
ray_sphere_intersect(isect, ray, spheres[snum]);
|
|
||||||
ray_plane_intersect(isect, ray, plane);
|
|
||||||
|
|
||||||
// Note use of 'coherent' if statement; the set of rays we
|
|
||||||
// trace will often all hit or all miss the scene
|
|
||||||
cif (isect.hit)
|
|
||||||
ret = ambient_occlusion(isect, plane, spheres, rngstate);
|
|
||||||
|
|
||||||
// This is a little grungy; we have results for
|
|
||||||
// programCount-worth of values. Because we're doing 2x2
|
|
||||||
// subsamples, we need to peel them off in groups of four,
|
|
||||||
// average the four values for each pixel, and update the
|
|
||||||
// output image.
|
|
||||||
//
|
|
||||||
// Store the varying value to a uniform array of the same size.
|
|
||||||
// See the discussion about communication among program
|
|
||||||
// instances in the ispc user's manual for more discussion on
|
|
||||||
// this idiom.
|
|
||||||
uniform float retArray[programCount];
|
|
||||||
retArray[programIndex] = ret;
|
|
||||||
|
|
||||||
// offset to the first pixel in the image
|
|
||||||
uniform int offset = 3 * (y * w + x);
|
|
||||||
for (uniform int p = 0; p < programCount; p += 4, offset += 3) {
|
|
||||||
// Get the four sample values for this pixel
|
|
||||||
uniform float sumret = retArray[p] + retArray[p+1] + retArray[p+2] +
|
|
||||||
retArray[p+3];
|
|
||||||
|
|
||||||
// Normalize by number of samples taken
|
|
||||||
sumret /= nsubsamples * nsubsamples;
|
|
||||||
|
|
||||||
// Store result in the image
|
|
||||||
image[offset+0] = sumret;
|
|
||||||
image[offset+1] = sumret;
|
|
||||||
image[offset+2] = sumret;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
605
stdlib.ispc
605
stdlib.ispc
@@ -795,217 +795,6 @@ static inline uniform int64 clock() {
|
|||||||
return __clock();
|
return __clock();
|
||||||
}
|
}
|
||||||
|
|
||||||
///////////////////////////////////////////////////////////////////////////
|
|
||||||
// Atomics and memory barriers
|
|
||||||
|
|
||||||
static inline void memory_barrier() {
|
|
||||||
__memory_barrier();
|
|
||||||
}
|
|
||||||
|
|
||||||
#define DEFINE_ATOMIC_OP(TA,TB,OPA,OPB,MASKTYPE) \
|
|
||||||
static inline TA atomic_##OPA##_global(uniform TA * uniform ptr, TA value) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
TA ret = __atomic_##OPB##_##TB##_global(ptr, value, (MASKTYPE)__mask); \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
static inline uniform TA atomic_##OPA##_global(uniform TA * uniform ptr, \
|
|
||||||
uniform TA value) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
uniform TA ret = __atomic_##OPB##_uniform_##TB##_global(ptr, value); \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
static inline TA atomic_##OPA##_global(uniform TA * varying ptr, TA value) { \
|
|
||||||
uniform TA * uniform ptrArray[programCount]; \
|
|
||||||
ptrArray[programIndex] = ptr; \
|
|
||||||
memory_barrier(); \
|
|
||||||
TA ret; \
|
|
||||||
uniform int mask = lanemask(); \
|
|
||||||
for (uniform int i = 0; i < programCount; ++i) { \
|
|
||||||
if ((mask & (1 << i)) == 0) \
|
|
||||||
continue; \
|
|
||||||
uniform TA * uniform p = ptrArray[i]; \
|
|
||||||
uniform TA v = extract(value, i); \
|
|
||||||
uniform TA r = __atomic_##OPB##_uniform_##TB##_global(p, v); \
|
|
||||||
ret = insert(ret, i, r); \
|
|
||||||
} \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
|
|
||||||
#define DEFINE_ATOMIC_SWAP(TA,TB) \
|
|
||||||
static inline TA atomic_swap_global(uniform TA * uniform ptr, TA value) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
uniform int i = 0; \
|
|
||||||
TA ret[programCount]; \
|
|
||||||
TA memVal; \
|
|
||||||
uniform int lastSwap; \
|
|
||||||
uniform int mask = lanemask(); \
|
|
||||||
/* First, have the first running program instance (if any) perform \
|
|
||||||
the swap with memory with its value of "value"; record the \
|
|
||||||
value returned. */ \
|
|
||||||
for (; i < programCount; ++i) { \
|
|
||||||
if ((mask & (1 << i)) == 0) \
|
|
||||||
continue; \
|
|
||||||
memVal = __atomic_swap_uniform_##TB##_global(ptr, extract(value, i)); \
|
|
||||||
lastSwap = i; \
|
|
||||||
break; \
|
|
||||||
} \
|
|
||||||
/* Now, for all of the remaining running program instances, set the \
|
|
||||||
return value of the last instance that did a swap with this \
|
|
||||||
instance's value of "value"; this gives the same effect as if the \
|
|
||||||
current instance had executed a hardware atomic swap right before \
|
|
||||||
the last one that did a swap. */ \
|
|
||||||
for (; i < programCount; ++i) { \
|
|
||||||
if ((mask & (1 << i)) == 0) \
|
|
||||||
continue; \
|
|
||||||
ret[lastSwap] = extract(value, i); \
|
|
||||||
lastSwap = i; \
|
|
||||||
} \
|
|
||||||
/* And the last instance that wanted to swap gets the value we \
|
|
||||||
originally got back from memory... */ \
|
|
||||||
ret[lastSwap] = memVal; \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret[programIndex]; \
|
|
||||||
} \
|
|
||||||
static inline uniform TA atomic_swap_global(uniform TA * uniform ptr, \
|
|
||||||
uniform TA value) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
uniform TA ret = __atomic_swap_uniform_##TB##_global(ptr, value); \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
static inline TA atomic_swap_global(uniform TA * varying ptr, TA value) { \
|
|
||||||
uniform TA * uniform ptrArray[programCount]; \
|
|
||||||
ptrArray[programIndex] = ptr; \
|
|
||||||
memory_barrier(); \
|
|
||||||
TA ret; \
|
|
||||||
uniform int mask = lanemask(); \
|
|
||||||
for (uniform int i = 0; i < programCount; ++i) { \
|
|
||||||
if ((mask & (1 << i)) == 0) \
|
|
||||||
continue; \
|
|
||||||
uniform TA * uniform p = ptrArray[i]; \
|
|
||||||
uniform TA v = extract(value, i); \
|
|
||||||
uniform TA r = __atomic_swap_uniform_##TB##_global(p, v); \
|
|
||||||
ret = insert(ret, i, r); \
|
|
||||||
} \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
|
|
||||||
#define DEFINE_ATOMIC_MINMAX_OP(TA,TB,OPA,OPB) \
|
|
||||||
static inline TA atomic_##OPA##_global(uniform TA * uniform ptr, TA value) { \
|
|
||||||
uniform TA oneval = reduce_##OPA(value); \
|
|
||||||
TA ret; \
|
|
||||||
if (lanemask() != 0) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
ret = __atomic_##OPB##_uniform_##TB##_global(ptr, oneval); \
|
|
||||||
memory_barrier(); \
|
|
||||||
} \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
static inline uniform TA atomic_##OPA##_global(uniform TA * uniform ptr, \
|
|
||||||
uniform TA value) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
uniform TA ret = __atomic_##OPB##_uniform_##TB##_global(ptr, value); \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
static inline TA atomic_##OPA##_global(uniform TA * varying ptr, \
|
|
||||||
TA value) { \
|
|
||||||
uniform TA * uniform ptrArray[programCount]; \
|
|
||||||
ptrArray[programIndex] = ptr; \
|
|
||||||
memory_barrier(); \
|
|
||||||
TA ret; \
|
|
||||||
uniform int mask = lanemask(); \
|
|
||||||
for (uniform int i = 0; i < programCount; ++i) { \
|
|
||||||
if ((mask & (1 << i)) == 0) \
|
|
||||||
continue; \
|
|
||||||
uniform TA * uniform p = ptrArray[i]; \
|
|
||||||
uniform TA v = extract(value, i); \
|
|
||||||
uniform TA r = __atomic_##OPB##_uniform_##TB##_global(p, v); \
|
|
||||||
ret = insert(ret, i, r); \
|
|
||||||
} \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
}
|
|
||||||
|
|
||||||
DEFINE_ATOMIC_OP(int32,int32,add,add,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(int32,int32,subtract,sub,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(int32,int32,min,min)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(int32,int32,max,max)
|
|
||||||
DEFINE_ATOMIC_OP(int32,int32,and,and,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(int32,int32,or,or,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(int32,int32,xor,xor,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_SWAP(int32,int32)
|
|
||||||
|
|
||||||
// For everything but atomic min and max, we can use the same
|
|
||||||
// implementations for unsigned as for signed.
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int32,int32,add,add,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int32,int32,subtract,sub,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(unsigned int32,uint32,min,umin)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(unsigned int32,uint32,max,umax)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int32,int32,and,and,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int32,int32,or,or,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int32,int32,xor,xor,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_SWAP(unsigned int32,int32)
|
|
||||||
|
|
||||||
DEFINE_ATOMIC_SWAP(float,float)
|
|
||||||
|
|
||||||
DEFINE_ATOMIC_OP(int64,int64,add,add,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(int64,int64,subtract,sub,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(int64,int64,min,min)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(int64,int64,max,max)
|
|
||||||
DEFINE_ATOMIC_OP(int64,int64,and,and,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(int64,int64,or,or,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(int64,int64,xor,xor,IntMaskType)
|
|
||||||
DEFINE_ATOMIC_SWAP(int64,int64)
|
|
||||||
|
|
||||||
// For everything but atomic min and max, we can use the same
|
|
||||||
// implementations for unsigned as for signed.
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int64,int64,add,add,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int64,int64,subtract,sub,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(unsigned int64,uint64,min,umin)
|
|
||||||
DEFINE_ATOMIC_MINMAX_OP(unsigned int64,uint64,max,umax)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int64,int64,and,and,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int64,int64,or,or,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_OP(unsigned int64,int64,xor,xor,UIntMaskType)
|
|
||||||
DEFINE_ATOMIC_SWAP(unsigned int64,int64)
|
|
||||||
|
|
||||||
DEFINE_ATOMIC_SWAP(double,double)
|
|
||||||
|
|
||||||
#undef DEFINE_ATOMIC_OP
|
|
||||||
#undef DEFINE_ATOMIC_MINMAX_OP
|
|
||||||
#undef DEFINE_ATOMIC_SWAP
|
|
||||||
|
|
||||||
#define ATOMIC_DECL_CMPXCHG(TA, TB, MASKTYPE) \
|
|
||||||
static inline TA atomic_compare_exchange_global( \
|
|
||||||
uniform TA * uniform ptr, TA oldval, TA newval) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
TA ret = __atomic_compare_exchange_##TB##_global(ptr, oldval, newval, \
|
|
||||||
(MASKTYPE)__mask); \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
} \
|
|
||||||
static inline uniform TA atomic_compare_exchange_global( \
|
|
||||||
uniform TA * uniform ptr, uniform TA oldval, uniform TA newval) { \
|
|
||||||
memory_barrier(); \
|
|
||||||
uniform TA ret = \
|
|
||||||
__atomic_compare_exchange_uniform_##TB##_global(ptr, oldval, newval); \
|
|
||||||
memory_barrier(); \
|
|
||||||
return ret; \
|
|
||||||
}
|
|
||||||
|
|
||||||
ATOMIC_DECL_CMPXCHG(int32, int32, IntMaskType)
|
|
||||||
ATOMIC_DECL_CMPXCHG(unsigned int32, int32, UIntMaskType)
|
|
||||||
ATOMIC_DECL_CMPXCHG(float, float, IntMaskType)
|
|
||||||
ATOMIC_DECL_CMPXCHG(int64, int64, IntMaskType)
|
|
||||||
ATOMIC_DECL_CMPXCHG(unsigned int64, int64, UIntMaskType)
|
|
||||||
ATOMIC_DECL_CMPXCHG(double, double, IntMaskType)
|
|
||||||
|
|
||||||
#undef ATOMIC_DECL_CMPXCHG
|
|
||||||
|
|
||||||
///////////////////////////////////////////////////////////////////////////
|
///////////////////////////////////////////////////////////////////////////
|
||||||
// Floating-Point Math
|
// Floating-Point Math
|
||||||
|
|
||||||
@@ -1389,6 +1178,400 @@ static inline uniform int64 clamp(uniform int64 v, uniform int64 low,
|
|||||||
return min(max(v, low), high);
|
return min(max(v, low), high);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
// Global atomics and memory barriers
|
||||||
|
|
||||||
|
static inline void memory_barrier() {
|
||||||
|
__memory_barrier();
|
||||||
|
}
|
||||||
|
|
||||||
|
#define DEFINE_ATOMIC_OP(TA,TB,OPA,OPB,MASKTYPE) \
|
||||||
|
static inline TA atomic_##OPA##_global(uniform TA * uniform ptr, TA value) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
TA ret = __atomic_##OPB##_##TB##_global(ptr, value, (MASKTYPE)__mask); \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline uniform TA atomic_##OPA##_global(uniform TA * uniform ptr, \
|
||||||
|
uniform TA value) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
uniform TA ret = __atomic_##OPB##_uniform_##TB##_global(ptr, value); \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline TA atomic_##OPA##_global(uniform TA * varying ptr, TA value) { \
|
||||||
|
uniform TA * uniform ptrArray[programCount]; \
|
||||||
|
ptrArray[programIndex] = ptr; \
|
||||||
|
memory_barrier(); \
|
||||||
|
TA ret; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
for (uniform int i = 0; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
uniform TA * uniform p = ptrArray[i]; \
|
||||||
|
uniform TA v = extract(value, i); \
|
||||||
|
uniform TA r = __atomic_##OPB##_uniform_##TB##_global(p, v); \
|
||||||
|
ret = insert(ret, i, r); \
|
||||||
|
} \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
|
||||||
|
#define DEFINE_ATOMIC_SWAP(TA,TB) \
|
||||||
|
static inline TA atomic_swap_global(uniform TA * uniform ptr, TA value) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
uniform int i = 0; \
|
||||||
|
TA ret[programCount]; \
|
||||||
|
TA memVal; \
|
||||||
|
uniform int lastSwap; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
/* First, have the first running program instance (if any) perform \
|
||||||
|
the swap with memory with its value of "value"; record the \
|
||||||
|
value returned. */ \
|
||||||
|
for (; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
memVal = __atomic_swap_uniform_##TB##_global(ptr, extract(value, i)); \
|
||||||
|
lastSwap = i; \
|
||||||
|
break; \
|
||||||
|
} \
|
||||||
|
/* Now, for all of the remaining running program instances, set the \
|
||||||
|
return value of the last instance that did a swap with this \
|
||||||
|
instance's value of "value"; this gives the same effect as if the \
|
||||||
|
current instance had executed a hardware atomic swap right before \
|
||||||
|
the last one that did a swap. */ \
|
||||||
|
for (; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
ret[lastSwap] = extract(value, i); \
|
||||||
|
lastSwap = i; \
|
||||||
|
} \
|
||||||
|
/* And the last instance that wanted to swap gets the value we \
|
||||||
|
originally got back from memory... */ \
|
||||||
|
ret[lastSwap] = memVal; \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret[programIndex]; \
|
||||||
|
} \
|
||||||
|
static inline uniform TA atomic_swap_global(uniform TA * uniform ptr, \
|
||||||
|
uniform TA value) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
uniform TA ret = __atomic_swap_uniform_##TB##_global(ptr, value); \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline TA atomic_swap_global(uniform TA * varying ptr, TA value) { \
|
||||||
|
uniform TA * uniform ptrArray[programCount]; \
|
||||||
|
ptrArray[programIndex] = ptr; \
|
||||||
|
memory_barrier(); \
|
||||||
|
TA ret; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
for (uniform int i = 0; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
uniform TA * uniform p = ptrArray[i]; \
|
||||||
|
uniform TA v = extract(value, i); \
|
||||||
|
uniform TA r = __atomic_swap_uniform_##TB##_global(p, v); \
|
||||||
|
ret = insert(ret, i, r); \
|
||||||
|
} \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
|
||||||
|
#define DEFINE_ATOMIC_MINMAX_OP(TA,TB,OPA,OPB) \
|
||||||
|
static inline TA atomic_##OPA##_global(uniform TA * uniform ptr, TA value) { \
|
||||||
|
uniform TA oneval = reduce_##OPA(value); \
|
||||||
|
TA ret; \
|
||||||
|
if (lanemask() != 0) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
ret = __atomic_##OPB##_uniform_##TB##_global(ptr, oneval); \
|
||||||
|
memory_barrier(); \
|
||||||
|
} \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline uniform TA atomic_##OPA##_global(uniform TA * uniform ptr, \
|
||||||
|
uniform TA value) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
uniform TA ret = __atomic_##OPB##_uniform_##TB##_global(ptr, value); \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline TA atomic_##OPA##_global(uniform TA * varying ptr, \
|
||||||
|
TA value) { \
|
||||||
|
uniform TA * uniform ptrArray[programCount]; \
|
||||||
|
ptrArray[programIndex] = ptr; \
|
||||||
|
memory_barrier(); \
|
||||||
|
TA ret; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
for (uniform int i = 0; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
uniform TA * uniform p = ptrArray[i]; \
|
||||||
|
uniform TA v = extract(value, i); \
|
||||||
|
uniform TA r = __atomic_##OPB##_uniform_##TB##_global(p, v); \
|
||||||
|
ret = insert(ret, i, r); \
|
||||||
|
} \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
}
|
||||||
|
|
||||||
|
DEFINE_ATOMIC_OP(int32,int32,add,add,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(int32,int32,subtract,sub,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(int32,int32,min,min)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(int32,int32,max,max)
|
||||||
|
DEFINE_ATOMIC_OP(int32,int32,and,and,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(int32,int32,or,or,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(int32,int32,xor,xor,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_SWAP(int32,int32)
|
||||||
|
|
||||||
|
// For everything but atomic min and max, we can use the same
|
||||||
|
// implementations for unsigned as for signed.
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int32,int32,add,add,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int32,int32,subtract,sub,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(unsigned int32,uint32,min,umin)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(unsigned int32,uint32,max,umax)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int32,int32,and,and,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int32,int32,or,or,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int32,int32,xor,xor,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_SWAP(unsigned int32,int32)
|
||||||
|
|
||||||
|
DEFINE_ATOMIC_SWAP(float,float)
|
||||||
|
|
||||||
|
DEFINE_ATOMIC_OP(int64,int64,add,add,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(int64,int64,subtract,sub,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(int64,int64,min,min)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(int64,int64,max,max)
|
||||||
|
DEFINE_ATOMIC_OP(int64,int64,and,and,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(int64,int64,or,or,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(int64,int64,xor,xor,IntMaskType)
|
||||||
|
DEFINE_ATOMIC_SWAP(int64,int64)
|
||||||
|
|
||||||
|
// For everything but atomic min and max, we can use the same
|
||||||
|
// implementations for unsigned as for signed.
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int64,int64,add,add,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int64,int64,subtract,sub,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(unsigned int64,uint64,min,umin)
|
||||||
|
DEFINE_ATOMIC_MINMAX_OP(unsigned int64,uint64,max,umax)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int64,int64,and,and,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int64,int64,or,or,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_OP(unsigned int64,int64,xor,xor,UIntMaskType)
|
||||||
|
DEFINE_ATOMIC_SWAP(unsigned int64,int64)
|
||||||
|
|
||||||
|
DEFINE_ATOMIC_SWAP(double,double)
|
||||||
|
|
||||||
|
#undef DEFINE_ATOMIC_OP
|
||||||
|
#undef DEFINE_ATOMIC_MINMAX_OP
|
||||||
|
#undef DEFINE_ATOMIC_SWAP
|
||||||
|
|
||||||
|
#define ATOMIC_DECL_CMPXCHG(TA, TB, MASKTYPE) \
|
||||||
|
static inline TA atomic_compare_exchange_global( \
|
||||||
|
uniform TA * uniform ptr, TA oldval, TA newval) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
TA ret = __atomic_compare_exchange_##TB##_global(ptr, oldval, newval, \
|
||||||
|
(MASKTYPE)__mask); \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline uniform TA atomic_compare_exchange_global( \
|
||||||
|
uniform TA * uniform ptr, uniform TA oldval, uniform TA newval) { \
|
||||||
|
memory_barrier(); \
|
||||||
|
uniform TA ret = \
|
||||||
|
__atomic_compare_exchange_uniform_##TB##_global(ptr, oldval, newval); \
|
||||||
|
memory_barrier(); \
|
||||||
|
return ret; \
|
||||||
|
}
|
||||||
|
|
||||||
|
ATOMIC_DECL_CMPXCHG(int32, int32, IntMaskType)
|
||||||
|
ATOMIC_DECL_CMPXCHG(unsigned int32, int32, UIntMaskType)
|
||||||
|
ATOMIC_DECL_CMPXCHG(float, float, IntMaskType)
|
||||||
|
ATOMIC_DECL_CMPXCHG(int64, int64, IntMaskType)
|
||||||
|
ATOMIC_DECL_CMPXCHG(unsigned int64, int64, UIntMaskType)
|
||||||
|
ATOMIC_DECL_CMPXCHG(double, double, IntMaskType)
|
||||||
|
|
||||||
|
#undef ATOMIC_DECL_CMPXCHG
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
// local atomics
|
||||||
|
|
||||||
|
#define LOCAL_ATOMIC(TYPE,NAME,OPFUNC) \
|
||||||
|
static inline uniform TYPE atomic_##NAME##_local(uniform TYPE * uniform ptr, \
|
||||||
|
uniform TYPE value) { \
|
||||||
|
uniform TYPE ret = *ptr; \
|
||||||
|
*ptr = OPFUNC(*ptr, value); \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline TYPE atomic_##NAME##_local(uniform TYPE * uniform ptr, TYPE value) { \
|
||||||
|
TYPE ret; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
for (uniform int i = 0; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
ret = insert(ret, i, *ptr); \
|
||||||
|
*ptr = OPFUNC(*ptr, extract(value, i)); \
|
||||||
|
} \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline TYPE atomic_##NAME##_local(uniform TYPE * p, TYPE value) { \
|
||||||
|
TYPE ret; \
|
||||||
|
uniform TYPE * uniform ptrs[programCount]; \
|
||||||
|
ptrs[programIndex] = p; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
for (uniform int i = 0; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
ret = insert(ret, i, *ptrs[i]); \
|
||||||
|
*ptrs[i] = OPFUNC(*ptrs[i], extract(value, i)); \
|
||||||
|
} \
|
||||||
|
return ret; \
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline uniform int32 __add(uniform int32 a, uniform int32 b) { return a+b; }
|
||||||
|
static inline uniform int32 __sub(uniform int32 a, uniform int32 b) { return a-b; }
|
||||||
|
static inline uniform int32 __and(uniform int32 a, uniform int32 b) { return a & b; }
|
||||||
|
static inline uniform int32 __or(uniform int32 a, uniform int32 b) { return a | b; }
|
||||||
|
static inline uniform int32 __xor(uniform int32 a, uniform int32 b) { return a ^ b; }
|
||||||
|
static inline uniform int32 __swap(uniform int32 a, uniform int32 b) { return b; }
|
||||||
|
|
||||||
|
static inline uniform unsigned int32 __add(uniform unsigned int32 a,
|
||||||
|
uniform unsigned int32 b) { return a+b; }
|
||||||
|
static inline uniform unsigned int32 __sub(uniform unsigned int32 a,
|
||||||
|
uniform unsigned int32 b) { return a-b; }
|
||||||
|
static inline uniform unsigned int32 __and(uniform unsigned int32 a,
|
||||||
|
uniform unsigned int32 b) { return a & b; }
|
||||||
|
static inline uniform unsigned int32 __or(uniform unsigned int32 a,
|
||||||
|
uniform unsigned int32 b) { return a | b; }
|
||||||
|
static inline uniform unsigned int32 __xor(uniform unsigned int32 a,
|
||||||
|
uniform unsigned int32 b) { return a ^ b; }
|
||||||
|
static inline uniform unsigned int32 __swap(uniform unsigned int32 a,
|
||||||
|
uniform unsigned int32 b) { return b; }
|
||||||
|
|
||||||
|
|
||||||
|
static inline uniform float __add(uniform float a, uniform float b) { return a+b; }
|
||||||
|
static inline uniform float __sub(uniform float a, uniform float b) { return a-b; }
|
||||||
|
static inline uniform float __swap(uniform float a, uniform float b) { return b; }
|
||||||
|
|
||||||
|
static inline uniform int64 __add(uniform int64 a, uniform int64 b) { return a+b; }
|
||||||
|
static inline uniform int64 __sub(uniform int64 a, uniform int64 b) { return a-b; }
|
||||||
|
static inline uniform int64 __and(uniform int64 a, uniform int64 b) { return a & b; }
|
||||||
|
static inline uniform int64 __or(uniform int64 a, uniform int64 b) { return a | b; }
|
||||||
|
static inline uniform int64 __xor(uniform int64 a, uniform int64 b) { return a ^ b; }
|
||||||
|
static inline uniform int64 __swap(uniform int64 a, uniform int64 b) { return b; }
|
||||||
|
|
||||||
|
static inline uniform unsigned int64 __add(uniform unsigned int64 a,
|
||||||
|
uniform unsigned int64 b) { return a+b; }
|
||||||
|
static inline uniform unsigned int64 __sub(uniform unsigned int64 a,
|
||||||
|
uniform unsigned int64 b) { return a-b; }
|
||||||
|
static inline uniform unsigned int64 __and(uniform unsigned int64 a,
|
||||||
|
uniform unsigned int64 b) { return a & b; }
|
||||||
|
static inline uniform unsigned int64 __or(uniform unsigned int64 a,
|
||||||
|
uniform unsigned int64 b) { return a | b; }
|
||||||
|
static inline uniform unsigned int64 __xor(uniform unsigned int64 a,
|
||||||
|
uniform unsigned int64 b) { return a ^ b; }
|
||||||
|
static inline uniform unsigned int64 __swap(uniform unsigned int64 a,
|
||||||
|
uniform unsigned int64 b) { return b; }
|
||||||
|
|
||||||
|
static inline uniform double __add(uniform double a, uniform double b) { return a+b; }
|
||||||
|
static inline uniform double __sub(uniform double a, uniform double b) { return a-b; }
|
||||||
|
static inline uniform double __swap(uniform double a, uniform double b) { return a-b; }
|
||||||
|
|
||||||
|
LOCAL_ATOMIC(int32, add, __add)
|
||||||
|
LOCAL_ATOMIC(int32, subtract, __sub)
|
||||||
|
LOCAL_ATOMIC(int32, and, __and)
|
||||||
|
LOCAL_ATOMIC(int32, or, __or)
|
||||||
|
LOCAL_ATOMIC(int32, xor, __xor)
|
||||||
|
LOCAL_ATOMIC(int32, min, min)
|
||||||
|
LOCAL_ATOMIC(int32, max, max)
|
||||||
|
LOCAL_ATOMIC(int32, swap, __swap)
|
||||||
|
|
||||||
|
LOCAL_ATOMIC(unsigned int32, add, __add)
|
||||||
|
LOCAL_ATOMIC(unsigned int32, subtract, __sub)
|
||||||
|
LOCAL_ATOMIC(unsigned int32, and, __and)
|
||||||
|
LOCAL_ATOMIC(unsigned int32, or, __or)
|
||||||
|
LOCAL_ATOMIC(unsigned int32, xor, __xor)
|
||||||
|
LOCAL_ATOMIC(unsigned int32, min, min)
|
||||||
|
LOCAL_ATOMIC(unsigned int32, max, max)
|
||||||
|
LOCAL_ATOMIC(unsigned int32, swap, __swap)
|
||||||
|
|
||||||
|
LOCAL_ATOMIC(float, add, __add)
|
||||||
|
LOCAL_ATOMIC(float, subtract, __sub)
|
||||||
|
LOCAL_ATOMIC(float, min, min)
|
||||||
|
LOCAL_ATOMIC(float, max, max)
|
||||||
|
LOCAL_ATOMIC(float, swap, __swap)
|
||||||
|
|
||||||
|
LOCAL_ATOMIC(int64, add, __add)
|
||||||
|
LOCAL_ATOMIC(int64, subtract, __sub)
|
||||||
|
LOCAL_ATOMIC(int64, and, __and)
|
||||||
|
LOCAL_ATOMIC(int64, or, __or)
|
||||||
|
LOCAL_ATOMIC(int64, xor, __xor)
|
||||||
|
LOCAL_ATOMIC(int64, min, min)
|
||||||
|
LOCAL_ATOMIC(int64, max, max)
|
||||||
|
LOCAL_ATOMIC(int64, swap, __swap)
|
||||||
|
|
||||||
|
LOCAL_ATOMIC(unsigned int64, add, __add)
|
||||||
|
LOCAL_ATOMIC(unsigned int64, subtract, __sub)
|
||||||
|
LOCAL_ATOMIC(unsigned int64, and, __and)
|
||||||
|
LOCAL_ATOMIC(unsigned int64, or, __or)
|
||||||
|
LOCAL_ATOMIC(unsigned int64, xor, __xor)
|
||||||
|
LOCAL_ATOMIC(unsigned int64, min, min)
|
||||||
|
LOCAL_ATOMIC(unsigned int64, max, max)
|
||||||
|
LOCAL_ATOMIC(unsigned int64, swap, __swap)
|
||||||
|
|
||||||
|
LOCAL_ATOMIC(double, add, __add)
|
||||||
|
LOCAL_ATOMIC(double, subtract, __sub)
|
||||||
|
LOCAL_ATOMIC(double, min, min)
|
||||||
|
LOCAL_ATOMIC(double, max, max)
|
||||||
|
LOCAL_ATOMIC(double, swap, __swap)
|
||||||
|
|
||||||
|
// compare exchange
|
||||||
|
#define LOCAL_CMPXCHG(TYPE) \
|
||||||
|
static inline uniform TYPE atomic_compare_exchange_local(uniform TYPE * uniform ptr, \
|
||||||
|
uniform TYPE cmp, \
|
||||||
|
uniform TYPE update) { \
|
||||||
|
uniform TYPE old = *ptr; \
|
||||||
|
if (old == cmp) \
|
||||||
|
*ptr = update; \
|
||||||
|
return old; \
|
||||||
|
} \
|
||||||
|
static inline TYPE atomic_compare_exchange_local(uniform TYPE * uniform ptr, \
|
||||||
|
TYPE cmp, TYPE update) { \
|
||||||
|
TYPE ret; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
for (uniform int i = 0; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
uniform TYPE old = *ptr; \
|
||||||
|
if (old == extract(cmp, i)) \
|
||||||
|
*ptr = extract(update, i); \
|
||||||
|
ret = insert(ret, i, old); \
|
||||||
|
} \
|
||||||
|
return ret; \
|
||||||
|
} \
|
||||||
|
static inline TYPE atomic_compare_exchange_local(uniform TYPE * varying p, \
|
||||||
|
TYPE cmp, TYPE update) { \
|
||||||
|
uniform TYPE * uniform ptrs[programCount]; \
|
||||||
|
ptrs[programIndex] = p; \
|
||||||
|
TYPE ret; \
|
||||||
|
uniform int mask = lanemask(); \
|
||||||
|
for (uniform int i = 0; i < programCount; ++i) { \
|
||||||
|
if ((mask & (1 << i)) == 0) \
|
||||||
|
continue; \
|
||||||
|
uniform TYPE old = *ptrs[i]; \
|
||||||
|
if (old == extract(cmp, i)) \
|
||||||
|
*ptrs[i] = extract(update, i); \
|
||||||
|
ret = insert(ret, i, old); \
|
||||||
|
} \
|
||||||
|
return ret; \
|
||||||
|
}
|
||||||
|
|
||||||
|
LOCAL_CMPXCHG(int32)
|
||||||
|
LOCAL_CMPXCHG(unsigned int32)
|
||||||
|
LOCAL_CMPXCHG(float)
|
||||||
|
LOCAL_CMPXCHG(int64)
|
||||||
|
LOCAL_CMPXCHG(unsigned int64)
|
||||||
|
LOCAL_CMPXCHG(double)
|
||||||
|
|
||||||
|
#undef LOCAL_ATOMIC
|
||||||
|
#undef LOCAL_CMPXCHG
|
||||||
|
|
||||||
///////////////////////////////////////////////////////////////////////////
|
///////////////////////////////////////////////////////////////////////////
|
||||||
// Transcendentals (float precision)
|
// Transcendentals (float precision)
|
||||||
|
|
||||||
|
|||||||
15
tests/local-atomics-1.ispc
Normal file
15
tests/local-atomics-1.ispc
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float delta = 1;
|
||||||
|
float b = atomic_add_local(&s, delta);
|
||||||
|
RET[programIndex] = reduce_add(b);
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = reduce_add(programIndex);
|
||||||
|
}
|
||||||
17
tests/local-atomics-10.ispc
Normal file
17
tests/local-atomics-10.ispc
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
float delta = 1;
|
||||||
|
if (programIndex < 2)
|
||||||
|
b = atomic_add_local(&s, delta);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = programCount == 1 ? 1 : 2;
|
||||||
|
}
|
||||||
20
tests/local-atomics-11.ispc
Normal file
20
tests/local-atomics-11.ispc
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
if (programIndex & 1)
|
||||||
|
b = atomic_add_local(&s, programIndex);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
uniform int sum = 0;
|
||||||
|
for (uniform int i = 0; i < programCount; ++i)
|
||||||
|
if (i & 1)
|
||||||
|
sum += i;
|
||||||
|
RET[programIndex] = sum;
|
||||||
|
}
|
||||||
20
tests/local-atomics-12.ispc
Normal file
20
tests/local-atomics-12.ispc
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
if (programIndex & 1)
|
||||||
|
b = atomic_or_local(&s, (1 << programIndex));
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
uniform int sum = 0;
|
||||||
|
for (uniform int i = 0; i < programCount; ++i)
|
||||||
|
if (i & 1)
|
||||||
|
sum += (1 << i);
|
||||||
|
RET[programIndex] = sum;
|
||||||
|
}
|
||||||
16
tests/local-atomics-13.ispc
Normal file
16
tests/local-atomics-13.ispc
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
if (programIndex & 1)
|
||||||
|
b = atomic_or_local(&s, (1 << programIndex));
|
||||||
|
RET[programIndex] = popcnt(reduce_max((int32)b));
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = programCount == 1 ? 0 : ((programCount/2) - 1);
|
||||||
|
}
|
||||||
20
tests/local-atomics-14.ispc
Normal file
20
tests/local-atomics-14.ispc
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int64 s = 0xffffffffff000000;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
if (programIndex & 1)
|
||||||
|
b = atomic_or_local(&s, (1 << programIndex));
|
||||||
|
RET[programIndex] = (s>>20);
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
uniform int sum = 0;
|
||||||
|
for (uniform int i = 0; i < programCount; ++i)
|
||||||
|
if (i & 1)
|
||||||
|
sum += (1 << i);
|
||||||
|
RET[programIndex] = ((unsigned int64)(0xffffffffff000000 | sum)) >> 20;
|
||||||
|
}
|
||||||
15
tests/local-atomics-2.ispc
Normal file
15
tests/local-atomics-2.ispc
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int64 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float delta = 1;
|
||||||
|
float b = atomic_add_local(&s, delta);
|
||||||
|
RET[programIndex] = reduce_add(b);
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = reduce_add(programIndex);
|
||||||
|
}
|
||||||
15
tests/local-atomics-3.ispc
Normal file
15
tests/local-atomics-3.ispc
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s = 0xff;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
int32 bits = 0xfff0;
|
||||||
|
float b = atomic_xor_local(&s, bits);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = (programCount & 1) ? 0xff0f : 0xff;
|
||||||
|
}
|
||||||
14
tests/local-atomics-4.ispc
Normal file
14
tests/local-atomics-4.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = atomic_or_local(&s, (1<<programIndex));
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = (1<<programCount)-1;
|
||||||
|
}
|
||||||
14
tests/local-atomics-5.ispc
Normal file
14
tests/local-atomics-5.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s = 0xbeef;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = atomic_swap_local(&s, programIndex);
|
||||||
|
RET[programIndex] = reduce_max(b);
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 0xbeef;
|
||||||
|
}
|
||||||
14
tests/local-atomics-6.ispc
Normal file
14
tests/local-atomics-6.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s = 2;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = atomic_compare_exchange_local(&s, programIndex, a*1000);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = (programCount == 1) ? 2 : 3000;
|
||||||
|
}
|
||||||
14
tests/local-atomics-7.ispc
Normal file
14
tests/local-atomics-7.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
int32 a = aFOO[programIndex];
|
||||||
|
float b = atomic_min_local(&s, a);
|
||||||
|
RET[programIndex] = reduce_min(b);
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = reduce_min(programIndex);
|
||||||
|
}
|
||||||
16
tests/local-atomics-8.ispc
Normal file
16
tests/local-atomics-8.ispc
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
int32 a = aFOO[programIndex];
|
||||||
|
int32 b = 0;
|
||||||
|
if (programIndex & 1)
|
||||||
|
b = atomic_max_local(&s, a);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = (programCount == 1) ? 0 : programCount;
|
||||||
|
}
|
||||||
17
tests/local-atomics-9.ispc
Normal file
17
tests/local-atomics-9.ispc
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
int32 delta = 1;
|
||||||
|
if (programIndex < 2)
|
||||||
|
b = atomic_add_local(&s, delta);
|
||||||
|
RET[programIndex] = reduce_add(b);
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = (programCount == 1) ? 0 : 1;
|
||||||
|
}
|
||||||
17
tests/local-atomics-swap.ispc
Normal file
17
tests/local-atomics-swap.ispc
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s = 1234;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
if (programIndex & 1) {
|
||||||
|
b = atomic_swap_local(&s, programIndex);
|
||||||
|
}
|
||||||
|
RET[programIndex] = reduce_add(b) + s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 1234 + reduce_add(programIndex & 1 ? programIndex : 0);
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-1.ispc
Normal file
14
tests/local-atomics-uniform-1.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 10;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform unsigned int32 b = atomic_add_local(&s, 1);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 11;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-2.ispc
Normal file
14
tests/local-atomics-uniform-2.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0b1010;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform unsigned int32 b = atomic_or_local(&s, 1);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 0b1011;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-3.ispc
Normal file
14
tests/local-atomics-uniform-3.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0b1010;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform unsigned int32 b = atomic_or_local(&s, 1);
|
||||||
|
RET[programIndex] = b;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 0b1010;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-4.ispc
Normal file
14
tests/local-atomics-uniform-4.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0xffff;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform unsigned int32 b = atomic_min_local(&s, 1);
|
||||||
|
RET[programIndex] = b;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 0xffff;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-5.ispc
Normal file
14
tests/local-atomics-uniform-5.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s = 0xffff;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform unsigned int32 b = atomic_min_local(&s, 1);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 1;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-6.ispc
Normal file
14
tests/local-atomics-uniform-6.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform float s = 100.;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform float b = atomic_swap_local(&s, 1.);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 1.;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-7.ispc
Normal file
14
tests/local-atomics-uniform-7.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform float s = 100.;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform float b = atomic_swap_local(&s, 1.);
|
||||||
|
RET[programIndex] = b;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 100.;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-8.ispc
Normal file
14
tests/local-atomics-uniform-8.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform float s = 100.;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform float b = atomic_compare_exchange_local(&s, 1., -100.);
|
||||||
|
RET[programIndex] = b;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 100.;
|
||||||
|
}
|
||||||
14
tests/local-atomics-uniform-9.ispc
Normal file
14
tests/local-atomics-uniform-9.ispc
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int64 s = 100.;
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
uniform int64 b = atomic_compare_exchange_local(&s, 100, -100);
|
||||||
|
RET[programIndex] = s;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = -100.;
|
||||||
|
}
|
||||||
18
tests/local-atomics-varyingptr-1.ispc
Normal file
18
tests/local-atomics-varyingptr-1.ispc
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s[programCount];
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
float delta = 1;
|
||||||
|
if (programIndex < 2)
|
||||||
|
atomic_add_local(&s[programIndex], delta);
|
||||||
|
RET[programIndex] = s[programIndex];
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 0;
|
||||||
|
RET[0] = RET[1] = 1;
|
||||||
|
}
|
||||||
16
tests/local-atomics-varyingptr-2.ispc
Normal file
16
tests/local-atomics-varyingptr-2.ispc
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s[programCount];
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
float delta = 1;
|
||||||
|
atomic_add_local(&s[programCount-1-programIndex], programIndex);
|
||||||
|
RET[programIndex] = s[programIndex];
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = programCount-1-programIndex;
|
||||||
|
}
|
||||||
18
tests/local-atomics-varyingptr-3.ispc
Normal file
18
tests/local-atomics-varyingptr-3.ispc
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform unsigned int32 s[programCount];
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
for (uniform int i = 0; i < programCount; ++i)
|
||||||
|
s[i] = 1234;
|
||||||
|
float a = aFOO[programIndex];
|
||||||
|
float b = 0;
|
||||||
|
float delta = 1;
|
||||||
|
a = atomic_max_local(&s[programIndex], programIndex);
|
||||||
|
RET[programIndex] = a;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = 1234;
|
||||||
|
}
|
||||||
15
tests/local-atomics-varyingptr-4.ispc
Normal file
15
tests/local-atomics-varyingptr-4.ispc
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
|
||||||
|
export uniform int width() { return programCount; }
|
||||||
|
|
||||||
|
uniform int32 s[programCount];
|
||||||
|
|
||||||
|
export void f_f(uniform float RET[], uniform float aFOO[]) {
|
||||||
|
for (uniform int i = 0; i < programCount; ++i)
|
||||||
|
s[i] = -1234;
|
||||||
|
atomic_max_local(&s[programIndex], programIndex);
|
||||||
|
RET[programIndex] = s[programIndex];
|
||||||
|
}
|
||||||
|
|
||||||
|
export void result(uniform float RET[]) {
|
||||||
|
RET[programIndex] = programIndex;
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user