Implement global atomics and a memory barrier in the standard library.

This checkin provides the standard set of atomic operations and a memory barrier in the ispc standard library.  Both signed and unsigned 32- and 64-bit integer types are supported.
This commit is contained in:
Matt Pharr
2011-07-04 17:20:42 +01:00
parent 24f47b300d
commit 5bcc611409
13 changed files with 364 additions and 9 deletions

View File

@@ -1,9 +1,9 @@
=== v1.0.3 === (not yet released)
=== v1.0.3 === (4 July 2011)
ispc now has a bulit-in pre-processor (from LLVM's clang compiler).
(Thanks to Pete Couperus!) It is therefore no longer necessary to use
cl.exe for preprocessing before on Windows; the MSVC proejct files for the
examples have been updated accordingly.
(Thanks to Pete Couperus for this patch!) It is therefore no longer
necessary to use cl.exe for preprocessing on Windows; the MSVC proejct
files for the examples have been updated accordingly.
There is another variant of the shuffle() function int the standard
library: "<type> shuffle(<type> v0, <type> v1, int permute)", where the
@@ -11,8 +11,15 @@ permutation vector indexes over the concatenation of the two vectors
(e.g. the value 0 corresponds to the first element of v0, the value
2*programCount-1 corresponds to the last element of v1, etc.)
ispc now supports the usual range of atomic operations (add, subtract, min,
max, and, or, and xor) as well as atomic swap and atomic compare and
exchange. There is also a facility for inserting memory fences. See the
"Atomic Operations and Memory Fences" section of the user's guide
(http://ispc.github.com/ispc.html#atomic-operations-and-memory-fences) for
more information.
There are now both 'signed' and 'unsigned' variants of the standard library
functions like packed_load_active() that that references to arrays of
functions like packed_load_active() that take references to arrays of
signed int32s and unsigned int32s respectively. (The
{load_from,store_to}_{int8,int16}() functions have similarly been augmented
to have both 'signed' and 'unsigned' variants.)

View File

@@ -1,6 +1,6 @@
#!/bin/bash
rst2html ispc.txt > ispc.html
rst2html.py ispc.txt > ispc.html
#rst2latex --section-numbering --documentclass=article --documentoptions=DIV=9,10pt,letterpaper ispc.txt > ispc.tex
#pdflatex ispc.tex

View File

@@ -76,6 +76,7 @@ Contents:
+ `Output Functions`_
+ `Cross-Program Instance Operations`_
+ `Packed Load and Store Operations`_
+ `Atomic Operations and Memory Fences`_
+ `Low-Level Bits`_
* `Interoperability with the Application`_
@@ -1811,6 +1812,69 @@ where the ``i`` th element of ``x`` has been replaced with the value ``v``
int insert(int x, uniform int i, uniform int v)
Atomic Operations and Memory Fences
-----------------------------------
The usual range of atomic memory operations are provided in ``ispc``. As an
example, consider the 32-bit integer atomic add routine:
::
int32 atomic_add_global(reference uniform int32 val, int32 delta)
The semantics are the expected ones for an atomic add function: the value
"val" has the value "delta" added to it atomically, and the old value of
"val" is returned from the function. (Thus, if multiple processors
simultaneously issue atomic adds to the same memory location, the adds will
be serialized by the hardware so that the correct result is computed in the
end.)
One thing to note is that that the value being added to here is a
``uniform`` integer, while the increment amount and the return value are
``varying``. In other words, the semantics are that each running program
instance individually issues the atomic operation with its own ``delta``
value and gets the previous value of ``val`` back in return.
Here are the declarations of the ``int32`` variants of these functions.
There are also ``int64`` equivalents as well as variants that take
``unsigned`` ``int32`` and ``int64`` values.
::
int32 atomic_add_global(reference uniform int32 val, int32 value)
int32 atomic_subtract_global(reference uniform int32 val, int32 value)
int32 atomic_min_global(reference uniform int32 val, int32 value)
int32 atomic_max_global(reference uniform int32 val, int32 value)
int32 atomic_and_global(reference uniform int32 val, int32 value)
int32 atomic_or_global(reference uniform int32 val, int32 value)
int32 atomic_xor_global(reference uniform int32 val, int32 value)
int32 atomic_swap_global(reference uniform int32 val, int32 newval)
There is also an atomic "compare and exchange" function; it atomically
compares the value in "val" to "compare"--if they match, it assigns
"newval" to "val". In either case, the old value of "val" is returned.
(As with the other atomic operations, there are also ``unsigned`` and
64-bit variants of this function.)
::
int32 atomic_compare_exchange_global(reference uniform int32 val,
int32 compare, int32 newval)
``ispc`` also has a standard library routine that inserts a memory barrier
into the code; it ensures that all memory reads and writes prior to be
barrier complete before any reads or writes after the barrier are issued.
See the `Linux kernel documentation on memory barriers`_ for an excellent
writeup on the need for that the use of memory barriers in multi-threaded
code.
.. _Linux kernel documentation on memory barriers: http://www.kernel.org/doc/Documentation/memory-barriers.txt
::
void memory_barrier();
Low-Level Bits
--------------