Implement global atomics and a memory barrier in the standard library.
This checkin provides the standard set of atomic operations and a memory barrier in the ispc standard library. Both signed and unsigned 32- and 64-bit integer types are supported.
This commit is contained in:
@@ -1,9 +1,9 @@
|
||||
=== v1.0.3 === (not yet released)
|
||||
=== v1.0.3 === (4 July 2011)
|
||||
|
||||
ispc now has a bulit-in pre-processor (from LLVM's clang compiler).
|
||||
(Thanks to Pete Couperus!) It is therefore no longer necessary to use
|
||||
cl.exe for preprocessing before on Windows; the MSVC proejct files for the
|
||||
examples have been updated accordingly.
|
||||
(Thanks to Pete Couperus for this patch!) It is therefore no longer
|
||||
necessary to use cl.exe for preprocessing on Windows; the MSVC proejct
|
||||
files for the examples have been updated accordingly.
|
||||
|
||||
There is another variant of the shuffle() function int the standard
|
||||
library: "<type> shuffle(<type> v0, <type> v1, int permute)", where the
|
||||
@@ -11,8 +11,15 @@ permutation vector indexes over the concatenation of the two vectors
|
||||
(e.g. the value 0 corresponds to the first element of v0, the value
|
||||
2*programCount-1 corresponds to the last element of v1, etc.)
|
||||
|
||||
ispc now supports the usual range of atomic operations (add, subtract, min,
|
||||
max, and, or, and xor) as well as atomic swap and atomic compare and
|
||||
exchange. There is also a facility for inserting memory fences. See the
|
||||
"Atomic Operations and Memory Fences" section of the user's guide
|
||||
(http://ispc.github.com/ispc.html#atomic-operations-and-memory-fences) for
|
||||
more information.
|
||||
|
||||
There are now both 'signed' and 'unsigned' variants of the standard library
|
||||
functions like packed_load_active() that that references to arrays of
|
||||
functions like packed_load_active() that take references to arrays of
|
||||
signed int32s and unsigned int32s respectively. (The
|
||||
{load_from,store_to}_{int8,int16}() functions have similarly been augmented
|
||||
to have both 'signed' and 'unsigned' variants.)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
#!/bin/bash
|
||||
|
||||
rst2html ispc.txt > ispc.html
|
||||
rst2html.py ispc.txt > ispc.html
|
||||
|
||||
#rst2latex --section-numbering --documentclass=article --documentoptions=DIV=9,10pt,letterpaper ispc.txt > ispc.tex
|
||||
#pdflatex ispc.tex
|
||||
|
||||
@@ -76,6 +76,7 @@ Contents:
|
||||
+ `Output Functions`_
|
||||
+ `Cross-Program Instance Operations`_
|
||||
+ `Packed Load and Store Operations`_
|
||||
+ `Atomic Operations and Memory Fences`_
|
||||
+ `Low-Level Bits`_
|
||||
|
||||
* `Interoperability with the Application`_
|
||||
@@ -1811,6 +1812,69 @@ where the ``i`` th element of ``x`` has been replaced with the value ``v``
|
||||
int insert(int x, uniform int i, uniform int v)
|
||||
|
||||
|
||||
Atomic Operations and Memory Fences
|
||||
-----------------------------------
|
||||
|
||||
The usual range of atomic memory operations are provided in ``ispc``. As an
|
||||
example, consider the 32-bit integer atomic add routine:
|
||||
|
||||
::
|
||||
|
||||
int32 atomic_add_global(reference uniform int32 val, int32 delta)
|
||||
|
||||
The semantics are the expected ones for an atomic add function: the value
|
||||
"val" has the value "delta" added to it atomically, and the old value of
|
||||
"val" is returned from the function. (Thus, if multiple processors
|
||||
simultaneously issue atomic adds to the same memory location, the adds will
|
||||
be serialized by the hardware so that the correct result is computed in the
|
||||
end.)
|
||||
|
||||
One thing to note is that that the value being added to here is a
|
||||
``uniform`` integer, while the increment amount and the return value are
|
||||
``varying``. In other words, the semantics are that each running program
|
||||
instance individually issues the atomic operation with its own ``delta``
|
||||
value and gets the previous value of ``val`` back in return.
|
||||
|
||||
Here are the declarations of the ``int32`` variants of these functions.
|
||||
There are also ``int64`` equivalents as well as variants that take
|
||||
``unsigned`` ``int32`` and ``int64`` values.
|
||||
|
||||
::
|
||||
|
||||
int32 atomic_add_global(reference uniform int32 val, int32 value)
|
||||
int32 atomic_subtract_global(reference uniform int32 val, int32 value)
|
||||
int32 atomic_min_global(reference uniform int32 val, int32 value)
|
||||
int32 atomic_max_global(reference uniform int32 val, int32 value)
|
||||
int32 atomic_and_global(reference uniform int32 val, int32 value)
|
||||
int32 atomic_or_global(reference uniform int32 val, int32 value)
|
||||
int32 atomic_xor_global(reference uniform int32 val, int32 value)
|
||||
int32 atomic_swap_global(reference uniform int32 val, int32 newval)
|
||||
|
||||
There is also an atomic "compare and exchange" function; it atomically
|
||||
compares the value in "val" to "compare"--if they match, it assigns
|
||||
"newval" to "val". In either case, the old value of "val" is returned.
|
||||
(As with the other atomic operations, there are also ``unsigned`` and
|
||||
64-bit variants of this function.)
|
||||
|
||||
::
|
||||
|
||||
int32 atomic_compare_exchange_global(reference uniform int32 val,
|
||||
int32 compare, int32 newval)
|
||||
|
||||
``ispc`` also has a standard library routine that inserts a memory barrier
|
||||
into the code; it ensures that all memory reads and writes prior to be
|
||||
barrier complete before any reads or writes after the barrier are issued.
|
||||
See the `Linux kernel documentation on memory barriers`_ for an excellent
|
||||
writeup on the need for that the use of memory barriers in multi-threaded
|
||||
code.
|
||||
|
||||
.. _Linux kernel documentation on memory barriers: http://www.kernel.org/doc/Documentation/memory-barriers.txt
|
||||
|
||||
::
|
||||
|
||||
void memory_barrier();
|
||||
|
||||
|
||||
Low-Level Bits
|
||||
--------------
|
||||
|
||||
|
||||
Reference in New Issue
Block a user