@@ -21,6 +21,7 @@ the most out of ``ispc`` in practice.
|
||||
+ `Avoid 64-bit Addressing Calculations When Possible`_
|
||||
+ `Avoid Computation With 8 and 16-bit Integer Types`_
|
||||
+ `Implementing Reductions Efficiently`_
|
||||
+ `Using "foreach_active" Effectively`_
|
||||
+ `Using Low-level Vector Tricks`_
|
||||
+ `The "Fast math" Option`_
|
||||
+ `"inline" Aggressively`_
|
||||
@@ -510,6 +511,43 @@ values--very efficient code in the end.
|
||||
return reduce_add(sum);
|
||||
}
|
||||
|
||||
Using "foreach_active" Effectively
|
||||
----------------------------------
|
||||
|
||||
For high-performance code,
|
||||
|
||||
For example, consider this segment of code, from the introduction of
|
||||
``foreach_active`` in the ispc User's Guide:
|
||||
|
||||
::
|
||||
|
||||
uniform float array[...] = { ... };
|
||||
int index = ...;
|
||||
foreach_active (i) {
|
||||
++array[index];
|
||||
}
|
||||
|
||||
Here, ``index`` was assumed to possibly have the same value for multiple
|
||||
program instances, so the updates to ``array[index]`` are serialized by the
|
||||
``foreach_active`` statement in order to not have undefined results when
|
||||
``index`` values do collide.
|
||||
|
||||
The code generated by the compiler can be improved in this case by making
|
||||
it clear that only a single element of the array is accessed by
|
||||
``array[index]`` and that thus a general gather or scatter isn't required.
|
||||
Specifically, by using the ``extract()`` function from the standard library
|
||||
to extract the current program instance's value of ``index`` into a
|
||||
``uniform`` variable and then using that to index into ``array``, as below,
|
||||
more efficient code is generated.
|
||||
|
||||
::
|
||||
|
||||
foreach_active (instanceNum) {
|
||||
uniform int unifIndex = extract(index, instanceNum);
|
||||
++array[unifIndex];
|
||||
}
|
||||
|
||||
|
||||
Using Low-level Vector Tricks
|
||||
-----------------------------
|
||||
|
||||
|
||||
Reference in New Issue
Block a user