Small documentation edits and updates.
This commit is contained in:
@@ -973,6 +973,12 @@ which of them will write their value of ``value`` to ``array[index]``.
|
|||||||
array[index] = value;
|
array[index] = value;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
While this rule that says that program instances can safely depend on
|
||||||
|
side-effects from by other program instances in their gang eliminates a
|
||||||
|
class of synchronization requirements imposed by some other SPMD languages,
|
||||||
|
it conversely means that it is possible to write ``ispc`` programs that
|
||||||
|
compute different results when run with different gang sizes.
|
||||||
|
|
||||||
|
|
||||||
Tasking Model
|
Tasking Model
|
||||||
-------------
|
-------------
|
||||||
@@ -1960,7 +1966,9 @@ value less than or equal to ``end``, specifying iteration over all integer
|
|||||||
values from ``start`` up to and including ``end-1``. An arbitrary number
|
values from ``start`` up to and including ``end-1``. An arbitrary number
|
||||||
of iteration dimensions may be specified, with each one spanning a
|
of iteration dimensions may be specified, with each one spanning a
|
||||||
different range of values. Within the ``foreach`` loop, the given
|
different range of values. Within the ``foreach`` loop, the given
|
||||||
identifiers are available as ``const varying int32`` variables.
|
identifiers are available as ``const varying int32`` variables. The
|
||||||
|
execution mask starts out "all on" at the start of each ``foreach`` loop
|
||||||
|
iteration, but may be changed by control flow constructs within the loop.
|
||||||
|
|
||||||
It is illegal to have a ``break`` statement or a ``return`` statement
|
It is illegal to have a ``break`` statement or a ``return`` statement
|
||||||
within a ``foreach`` loop; a compile-time error will be issued in this
|
within a ``foreach`` loop; a compile-time error will be issued in this
|
||||||
@@ -1994,28 +2002,41 @@ the gang size is 8:
|
|||||||
// perform computation on element i
|
// perform computation on element i
|
||||||
}
|
}
|
||||||
|
|
||||||
The program counter will step through the statements of this loop just
|
One possible valid execution path of this loop would be for the program
|
||||||
``16/8==2`` times; the first time through, the ``varying int32`` variable
|
counter the step through the statements of this loop just ``16/8==2``
|
||||||
``i`` will have the values (0,1,2,3,4,5,6,7) over the program instances,
|
times; the first time through, with the ``varying int32`` variable ``i``
|
||||||
and the second time through, ``i`` will have the values
|
having the values (0,1,2,3,4,5,6,7) over the program instances, and the
|
||||||
(8,9,10,11,12,13,14,15), thus mapping the available program instances to
|
second time through, having the values (8,9,10,11,12,13,14,15), thus
|
||||||
all of the data by the end of the loop's execution. The execution mask
|
mapping the available program instances to all of the data by the end of
|
||||||
starts out "all on" at the start of each ``foreach`` loop iteration, but
|
the loop's execution.
|
||||||
may be changed by control flow constructs within the loop.
|
|
||||||
|
|
||||||
The basic ``foreach`` statement subdivides the iteration domain by mapping
|
In general, however, you shouldn't make any assumptions about the order in
|
||||||
a gang-size worth of values in the innermost dimension to the gang, only
|
which elements of the iteration domain will be processed by a ``foreach``
|
||||||
spanning a single value in each of the outer dimensions. This
|
loop. For example, the following code exhibits undefined behavior:
|
||||||
decomposition generally leads to coherent memory reads and writes, but may
|
|
||||||
lead to worse control flow coherence than other decompositions.
|
::
|
||||||
|
|
||||||
|
uniform float a[10][100];
|
||||||
|
foreach (i = 0 ... 10, j = 0 ... 100) {
|
||||||
|
if (i == 0)
|
||||||
|
a[i][j] = j;
|
||||||
|
else
|
||||||
|
// Error: can't assume that a[i-1][j] has been set yet
|
||||||
|
a[i][j] = a[i-1][j];
|
||||||
|
|
||||||
|
The ``foreach`` statement generally subdivides the iteration domain by
|
||||||
|
selecting sets of contiguous elements in the inner-most dimension of the
|
||||||
|
iteration domain. This decomposition approach generally leads to coherent
|
||||||
|
memory reads and writes, but may lead to worse control flow coherence than
|
||||||
|
other decompositions.
|
||||||
|
|
||||||
Therefore, ``foreach_tiled`` decomposes the iteration domain in a way that
|
Therefore, ``foreach_tiled`` decomposes the iteration domain in a way that
|
||||||
tries to map locations in the domain to program instances in a way that is
|
tries to map locations in the domain to program instances in a way that is
|
||||||
compact across all of the dimensions. For example, on a target with an
|
compact across all of the dimensions. For example, on a target with an
|
||||||
8-wide gang size, the following ``foreach_tiled`` statement processes the
|
8-wide gang size, the following ``foreach_tiled`` statement might process
|
||||||
iteration domain in chunks of 2 elements in ``j`` and 4 elements in ``i``
|
the iteration domain in chunks of 2 elements in ``j`` and 4 elements in
|
||||||
each time. (The trade-offs between these two constructs are discussed in
|
``i`` each time. (The trade-offs between these two constructs are
|
||||||
more detail in the `ispc Performance Guide`_.)
|
discussed in more detail in the `ispc Performance Guide`_.)
|
||||||
|
|
||||||
.. _ispc Performance Guide: perf.html#improving-control-flow-coherence-with-foreach-tiled
|
.. _ispc Performance Guide: perf.html#improving-control-flow-coherence-with-foreach-tiled
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user