Small documentation edits and updates.

This commit is contained in:
Matt Pharr
2011-12-03 15:20:04 -08:00
parent d492ba08e6
commit 0fd7811344

View File

@@ -973,6 +973,12 @@ which of them will write their value of ``value`` to ``array[index]``.
array[index] = value; array[index] = value;
} }
While this rule that says that program instances can safely depend on
side-effects from by other program instances in their gang eliminates a
class of synchronization requirements imposed by some other SPMD languages,
it conversely means that it is possible to write ``ispc`` programs that
compute different results when run with different gang sizes.
Tasking Model Tasking Model
------------- -------------
@@ -1960,7 +1966,9 @@ value less than or equal to ``end``, specifying iteration over all integer
values from ``start`` up to and including ``end-1``. An arbitrary number values from ``start`` up to and including ``end-1``. An arbitrary number
of iteration dimensions may be specified, with each one spanning a of iteration dimensions may be specified, with each one spanning a
different range of values. Within the ``foreach`` loop, the given different range of values. Within the ``foreach`` loop, the given
identifiers are available as ``const varying int32`` variables. identifiers are available as ``const varying int32`` variables. The
execution mask starts out "all on" at the start of each ``foreach`` loop
iteration, but may be changed by control flow constructs within the loop.
It is illegal to have a ``break`` statement or a ``return`` statement It is illegal to have a ``break`` statement or a ``return`` statement
within a ``foreach`` loop; a compile-time error will be issued in this within a ``foreach`` loop; a compile-time error will be issued in this
@@ -1994,28 +2002,41 @@ the gang size is 8:
// perform computation on element i // perform computation on element i
} }
The program counter will step through the statements of this loop just One possible valid execution path of this loop would be for the program
``16/8==2`` times; the first time through, the ``varying int32`` variable counter the step through the statements of this loop just ``16/8==2``
``i`` will have the values (0,1,2,3,4,5,6,7) over the program instances, times; the first time through, with the ``varying int32`` variable ``i``
and the second time through, ``i`` will have the values having the values (0,1,2,3,4,5,6,7) over the program instances, and the
(8,9,10,11,12,13,14,15), thus mapping the available program instances to second time through, having the values (8,9,10,11,12,13,14,15), thus
all of the data by the end of the loop's execution. The execution mask mapping the available program instances to all of the data by the end of
starts out "all on" at the start of each ``foreach`` loop iteration, but the loop's execution.
may be changed by control flow constructs within the loop.
The basic ``foreach`` statement subdivides the iteration domain by mapping In general, however, you shouldn't make any assumptions about the order in
a gang-size worth of values in the innermost dimension to the gang, only which elements of the iteration domain will be processed by a ``foreach``
spanning a single value in each of the outer dimensions. This loop. For example, the following code exhibits undefined behavior:
decomposition generally leads to coherent memory reads and writes, but may
lead to worse control flow coherence than other decompositions. ::
uniform float a[10][100];
foreach (i = 0 ... 10, j = 0 ... 100) {
if (i == 0)
a[i][j] = j;
else
// Error: can't assume that a[i-1][j] has been set yet
a[i][j] = a[i-1][j];
The ``foreach`` statement generally subdivides the iteration domain by
selecting sets of contiguous elements in the inner-most dimension of the
iteration domain. This decomposition approach generally leads to coherent
memory reads and writes, but may lead to worse control flow coherence than
other decompositions.
Therefore, ``foreach_tiled`` decomposes the iteration domain in a way that Therefore, ``foreach_tiled`` decomposes the iteration domain in a way that
tries to map locations in the domain to program instances in a way that is tries to map locations in the domain to program instances in a way that is
compact across all of the dimensions. For example, on a target with an compact across all of the dimensions. For example, on a target with an
8-wide gang size, the following ``foreach_tiled`` statement processes the 8-wide gang size, the following ``foreach_tiled`` statement might process
iteration domain in chunks of 2 elements in ``j`` and 4 elements in ``i`` the iteration domain in chunks of 2 elements in ``j`` and 4 elements in
each time. (The trade-offs between these two constructs are discussed in ``i`` each time. (The trade-offs between these two constructs are
more detail in the `ispc Performance Guide`_.) discussed in more detail in the `ispc Performance Guide`_.)
.. _ispc Performance Guide: perf.html#improving-control-flow-coherence-with-foreach-tiled .. _ispc Performance Guide: perf.html#improving-control-flow-coherence-with-foreach-tiled