Documentation updates for 1.3.0.

This commit is contained in:
Matt Pharr
2012-06-29 08:35:29 -07:00
parent 27d6c12972
commit b7bc76d3cc
2 changed files with 242 additions and 115 deletions

View File

@@ -1,3 +1,63 @@
=== v1.3.0 === (29 June 2012)
This is a major new release of ispc, with support for more compilation
targets and a number of additions to the language. As usual, the quality
of generated code has also been improved in a number of cases and a number
of small bugs have been fixed.
New targets:
* This release provides "beta" support for compiling to Intel Xeon Phi (the
"Many Integrated Core" arthiecture). See
http://ispc.github.com/ispc.html#compiling-for-the-intel-xeon-phi-architecture
for more details on this support.
* This release also has an "avx1.1" target, which provides support for the
new instructions in the Intel Ivy Bridge microarchitecutre.
New language features:
* The foreach_active statement allows iteration over the active program
instances in a gang. (See
http://ispc.github.com/ispc.html#iteration-over-active-program-instances-foreach-active)
* foreach_unique allows iterating over subsets of program instances in a
gang that share the same value of a variable. (See
http://ispc.github.com/ispc.html#iteration-over-unique-elements-foreach-unique)
* An "unmasked" function qualifier and statement in the language allow
re-activating execution of all program instances in a gang. (See
http://ispc.github.com/ispc.html#re-establishing-the-execution-mask
Standard library updates:
* The seed_rng() function has been modified to take a "varying" seed value
when a varying RNGState is being initialized.
* An isnan() function has been added, to check for floating-point "not a
number" values.
* The float_to_srgb8() routine does high performance conversion of
floating-point color values to SRGB8 format.
Other changes:
* A number of bugfixes have been made for compiler crashes with malformed
programs.
* Floating-point comparisons are now "unordered", so that any comparison
where one of the operands is a "not a number" value returns false. (This
matches standard IEEE floating-point behavior.)
* The code generated for 'break' statements in "varying" loops has been
improved for some common cases.
* Compile time and compiler memory use have both been improved,
particularly for large input programs.
* A nubmer of bugs have been fixed in the debugging information generated
by the compiler when the "-g" command-line flag is used.
=== v1.2.2 === (20 April 2012)
This release includes a number of small additions to functionality and a

View File

@@ -46,6 +46,8 @@ Contents:
* `Recent Changes to ISPC`_
+ `Updating ISPC Programs For Changes In ISPC 1.1`_
+ `Updating ISPC Programs For Changes In ISPC 1.2`_
+ `Updating ISPC Programs For Changes In ISPC 1.3`_
* `Getting Started with ISPC`_
@@ -57,6 +59,7 @@ Contents:
+ `Basic Command-line Options`_
+ `Selecting The Compilation Target`_
+ `Generating Generic C++ Output`_
+ `Compiling For The Intel Xeon Phi Architecture`_
+ `Selecting 32 or 64 Bit Addressing`_
+ `The Preprocessor`_
+ `Debugging`_
@@ -225,6 +228,48 @@ These are the relevant changes to the language:
instances. See the Section `Parallel Iteration Statements: "foreach" and
"foreach_tiled"`_ for more information about these.
Updating ISPC Programs For Changes In ISPC 1.2
----------------------------------------------
The following changes were made to the language syntax and semantics for
the ``ispc`` 1.2 release:
* Syntax for the "launch" keyword has been cleaned up; it's now no longer
necessary to bracket the launched function call with angle brackets. (In
other words, now use ``launch foo();``, rather than ``launch < foo() >;``.)
* When using pointers, the pointed-to data type is now "uniform" by
default. Use the varying keyword to specify varying pointed-to types
when needed. (i.e. ``float *ptr`` is a varying pointer to uniform float
data, whereas previously it was a varying pointer to varying float
values.) Use ``varying float *`` to specify a varying pointer to varying
float data, and so forth.
* The details of "uniform" and "varying" and how they interact with struct
types have been cleaned up. Now, when a struct type is declared, if the
struct elements don't have explicit "uniform" or "varying" qualifiers,
they are said to have "unbound" variability. When a struct type is
instantiated, any unbound variability elements inherit the variability of
the parent struct type. See `Struct Types`_ for more details.
* ``ispc`` has a new language feature that makes it much easier to use the
efficient "(array of) structure of arrays" (AoSoA, or SoA) memory layout
of data. A new ``soa<n>`` qualifier can be applied to structure types to
specify an n-wide SoA version of the corresponding type. Array indexing
and pointer operations with arrays SoA types automatically handles the
two-stage indexing calculation to access the data. See `Structure of
Array Types`_ for more details.
Updating ISPC Programs For Changes In ISPC 1.3
----------------------------------------------
This release adds a number of new iteration constructs, which in turn use
new reserved words: ``unmasked``, ``foreach_unique``, ``foreach_active``,
and ``in``. Any program that happens to have a variable or function with
one of these names must be modified to rename that symbol.
Getting Started with ISPC
=========================
@@ -441,11 +486,12 @@ CPU.
ispc foo.ispc -o foo.obj --cpu=corei7-avx
Finally, ``--target`` selects between the SSE2, SSE4, and AVX instruction
sets. (As general context, SSE2 was first introduced in processors that
shipped in 2001, SSE4 was introduced in 2007, and processors with AVX
were introduced in 2010. Consult your CPU's manual for specifics on which
vector instruction set it supports.)
Finally, ``--target`` selects between the SSE2, SSE4, and AVX, and AVX2
instruction sets. (As general context, SSE2 was first introduced in
processors that shipped in 2001, SSE4 was introduced in 2007, and
processors with AVX were introduced in 2010. AVX2 will be supported on
future CPUs based on Intel's "Haswell" architecture. Consult your CPU's
manual for specifics on which vector instruction set it supports.)
By default, the target instruction set is chosen based on the most capable
one supported by the system on which you're running ``ispc``. You can
@@ -513,6 +559,59 @@ C++ file; this can be used to easily include specific implementations of
the vector types and functions.
Compiling For The Intel Xeon Phi Architecture
---------------------------------------------
``ispc`` has beta-level support for compiling for the many-core Intel® Xeon
Phi architecture (formerly, "Many Integrated Cores" / MIC.) This support
is based on the "generic" C++ output, described in the previous section.
To compile for Xeon Phi, first generate intermediate C++ code:
::
ispc foo.ispc --emit-c++ --target=generic-16 -o foo.cpp \
--c++-include-file=knc.h
The ``ispc`` distribution now includes a header file,
``examples/intrinsics/knc.h``, which maps from the generic C++ output to
the corresponding intrinsic operations for Intel Xeon Phi. Thus, to
generate an object file, use the Intel C Compiler (``icc``) compile the C++
code generated by ``ispc``, setting the ``#include`` search path so that it
can find the ``examples/intrinsics/knc.h`` header file in the ``ispc``
distribution.
With the current beta implementation, complex ``ispc`` programs are able to
run on Xeon Phi, though there are a number of known limitations:
* The ``examples/intrinsics/knc.h`` header file isnt complete yet; for
example, vector operations with ``int8`` and ``int16`` types arent yet
implemented. Programs that operate on ``varying`` ``int32``, ``float``,
and ``double`` data-types (and ``uniform`` variables of any data type,
and arrays and structures of these types), should operate correctly.
* If you use the ``launch`` functionality to launch tasks across cores,
note that the pthreads task system implemented in
``examples/tasksys.cpp`` hasnt been tuned for Xeon Phi yet, and has
known issues with setting thread affinities optimally.
* The compiler currently emits unaligned memory accesses in many cases
where the memory address is actually aligned. This may unnecessarily
impact performance.
All of these issues are currently actively being addressed and will be
fixed in future releases.
If you do use the current version of ``ispc`` on Xeon Phi, please let us
know of any bugs or unexpected results. (Also, any interesting results!).
*Note that access to Xeon Phi and public discussion of Xeon Phi performance
is still governed by NDA*, so please send email to "matt dot pharr at intel
dot com" for any issues that shouldn't be filed in the `public ispc bug
tracker`_.
.. _public ispc bug tracker: https://github.com/ispc/ispc/issues
Selecting 32 or 64 Bit Addressing
---------------------------------
@@ -559,7 +658,7 @@ preprocessor runs:
- 1
- Major version of the ``ispc`` compiler/language
* - ISPC_MINOR_VERSION
- 1
- 3
- Minor version of the ``ispc`` compiler/language
* - PI
- 3.1415926535
@@ -568,17 +667,31 @@ preprocessor runs:
Debugging
---------
Support for debugging in ``ispc`` is in progress. On Linux\* and Mac
OS\*, the ``-g`` command-line flag can be supplied to the compiler,
which causes it to generate debugging symbols. Running ``ispc`` programs
in the debugger, setting breakpoints, printing out variables and the like
all generally works, though there is occasional unexpected behavior.
On Linux\* and Mac OS\*, the ``-g`` command-line flag can be supplied to
the compiler, which causes it to generate debugging symbols. Running
``ispc`` programs in the debugger, setting breakpoints, printing out
variables is just the same as debugging C/C++ programs. Similarly, you can
directly step up and down the call stack between ``ispc`` code and C/C++
code.
Another option for debugging (the only current option on Windows\*) is to
use the ``print`` statement for ``printf()`` style debugging. (See `Output
Functions`_ for more information.) You can also use the ability to call
back to application code at particular points in the program, passing a set
of variable values to be logged or otherwise analyzed from there.
One limitation of the current debugging support is that the debugger
provides a window into an entire gang's worth of program instances, rather
than just a single program instance. (These concepts will be introduced
shortly, in `Basic Concepts: Program Instances and Gangs of Program
Instances`). Thus, when a ``varying`` variable is printed, the values for
each of the program instances are displayed. Along similar lines, the path
the debugger follows through program source code passes each statement that
any program instance wants to execute (see `Control Flow Within A Gang`_
for more details on control flow in ``ispc``.)
While debugging, a variable, ``__mask``, is available to provide the
current program execution mask at the current point in the program
Another option for debugging (and the only current option on Windows\*) is
to use the ``print`` statement for ``printf()`` style debugging. (See
`Output Functions`_ for more information.) You can also use the ability to
call back to application code at particular points in the program, passing
a set of variable values to be logged or otherwise analyzed from there.
The ISPC Parallel Execution Model
@@ -643,7 +756,7 @@ current processor, leading to excellent utilization of hardware SIMD units
and high performance.
The number of program instances in a gang is relatively small; in practice,
it's no more than twice the native SIMD width of the hardware it is
it's no more than 2-4x the native SIMD width of the hardware it is
executing on. (Thus, four or eight program instances in a gang on a CPU
using the the 4-wide SSE instruction set, and eight or sixteen on a CPU
using 8-wide AVX.)
@@ -671,19 +784,19 @@ program instances in the gang: some of the currently running program
instances want to execute the statements for the "true" case and some want
to execute the statements for the "false" case.
Complex control flow in ``ispc`` programs generally "just works", computing
the same results for each program instance in a gang as would have been
computed if the equivalent code ran serially in C to compute each program
instance's result individually. However, here we will more precisely
define the execution model for control flow in order to be able to
precisely define the language's behavior in specific situations.
Complex control flow in ``ispc`` programs generally works as expected,
computing the same results for each program instance in a gang as would
have been computed if the equivalent code ran serially in C to compute each
program instance's result individually. However, here we will more
precisely define the execution model for control flow in order to be able
to precisely define the language's behavior in specific situations.
We will specify the notion of a *program counter* and how it is updated to
step through the program, and an *execution mask* that indicates which
program instances want to execute the instruction at the current program
counter. The program counter a single program counter shared by all of the
program instances in the gang; it points to a single instruction to be
executed next. The execution mask is a per-program instance boolean value
executed next. The execution mask is a per-program-instance boolean value
that indicates whether or not side effects from the current instruction
should effect each program instance. Thus, for example, if a statement
were to be executed with an "all off" mask, there should be no observable
@@ -731,45 +844,22 @@ compiler output:
bool test = (x < y);
mask originalMask = get_current_mask();
set_mask(originalMask & test);
// true statements
if (any_mask_entries_are_enabled()) {
// true statements
}
set_mask(originalMask & ~test);
// false statements
if (any_mask_entries_are_enabled()) {
// false statements
}
set_mask(originalMask);
In other words, the program counter steps through the statements for both
the "true" case and the "false" case, with the execution mask set so that
no side-effects from the true statements affect the program instances that
want to run the false statements, and vice versa. the execution mask is
then restored to the value it had before the ``if`` statement.
However, the compiler is free to generate different code for an ``if``
test, such as:
::
float x = ..., y = ...;
bool test = (x < y);
mask originalMask = get_current_mask();
if (all_off(originalMask & test))
goto else_stmts;
set_mask(originalMask & test);
// true statements
else_stmts:
if (all_off(originalMask & ~test))
goto done;
set_mask(originalMask & ~test);
// false statements
done:
set_mask(originalMask);
Furthermore, the order in which the program counter steps through the
code for the "true" and "false" statements is undefined.
In most cases, there is no programmer-visible difference between these two
ways of compiling ``if``, though see the `Uniform Variables and Varying
Control Flow`_ section for a case where it causes undefined behavior in one
particular situation.
want to run the false statements, and vice versa. However, a block of
statements does not execute if the mask is "all off" upon entry to that
block. The execution mask is then restored to the value it had before the
``if`` statement.
Control Flow Example: Loops
---------------------------
@@ -883,8 +973,8 @@ It is an error to try to assign a ``varying`` value to a ``uniform``
variable, though ``uniform`` values can be assigned to ``uniform``
variables. Assignments to ``uniform`` variables are not affected by the
execution mask (there's no unambiguous way that they could be); rather,
they always apply if the program pointer executes a statement that is a
uniform assignment.
they always apply if the program counter pointer passes through a statement
that is a ``uniform`` assignment.
Uniform Control Flow
@@ -956,11 +1046,10 @@ instances that are supposed to be executing the corresponding clause.
Under this model, we must define the effect of modifying ``uniform``
variables in the context of varying control flow.
In most cases, modifying ``uniform`` variables under varying control flow
leads to the ``uniform`` variable having an undefined value, except within
a block where the ``uniform`` value had a value assigned to it.
Consider the following example, which illustrates three cases.
In general, modifying ``uniform`` variables under varying control flow
leads to the ``uniform`` variable having a value that depends on whether
any of the program instances in the gang followed a particular execution
path. Consider the following example:
::
@@ -968,43 +1057,20 @@ Consider the following example, which illustrates three cases.
uniform int b = 0;
if (a == 0) {
++b;
// use b: undefined! May be 1 or 11.
// b is 1
}
else {
b = 10;
// can use b, has value 10
// b is 10
}
// b is undefined: may be 10 or 11
// whether b is 1 or 10 depends on whether any of the values
// of "a" in the executing gang were 0.
There are three principles of ``ispc``'s execution model that have been
previously introduced that together explain the results above. They are:
1. Modifications to ``uniform`` variables aren't affected by the
execution mask.
2. The "true" and "false" clauses of a varying ``if`` statement may be
executed in either order.
3. Varying ``if`` statements may in some cases execute the instructions
for one of their clauses with the execution mask "all off".
Thus, within the "true" clause, the value of ``b`` is undefined since the
"else" clause may or may not have executed before the clause for the true
case.
Within the "else" clause, the assignment ``b = 10`` applies, giving ``b`` a
well-defined value within the "else" clause and ``b`` can validly be used
in the remainder of the code in that block.
Finally, ``b`` is undefined after the end of the "else" clause, since it is
possible (but not necessarily the case) that one the clauses may have
executed with an "all off" mask. Thus, even if ``a`` had a non-zero value
for all program instances in the gang, it's possible that the "true" clause
executed with an "all off" mask and ``b`` was modified there.
If it is important that code never be executed with an "all off" execution
mask, then the ``cif`` statement (documented in the `"Coherent" Control Flow
Statements: "cif" and Friends`_ section) can be used in place of a regular
``if``, as it guarantees this property.
Here, if any of the values of ``a`` across the gang was non-zero, then
``b`` will have a value of 10 after the ``if`` statement has executed.
However, if all of the values of ``a`` in the currently-executing program
instances at the start of the ``if`` statement had a value of zero, then
``b`` would have a value of 1.
Data Races Within a Gang
@@ -1191,6 +1257,10 @@ C++:
* Parallel ``foreach`` and ``foreach_tiled`` iteration constructs (see
`Parallel Iteration Statements: "foreach" and "foreach_tiled"`_)
* The ``foreach_active`` and ``foreach_unique`` iteration constructs, which
provide ways of iterating over subsets of the program instances in the
gang. See `Iteration over active program instances: "foreach_active"`_
and `Iteration over unique elements: "foreach_unique"`_.)
* Language support for task parallelism (see `Task Parallel Execution`_)
* "Coherent" control flow statements that indicate that control flow is
expected to be coherent across the running program instances (see
@@ -1233,10 +1303,11 @@ The following reserved words from C89 are also reserved in ``ispc``:
``ispc`` additionally reserves the following words:
``bool``, ``export``, ``cdo``, ``cfor``, ``cif``, ``cwhile``, ``false``,
``foreach``, ``foreach_tiled``, ``inline``, ``int8``, ``int16``, ``int32``,
``int64``, ``launch``, ``print``, ``reference``, ``soa``, ``sync``,
``task``, ``true``, ``uniform``, and ``varying``.
``bool``, ``delete``, ``export``, ``cdo``, ``cfor``, ``cif``, ``cwhile``,
``false``, ``foreach``, ``foreach_active``, ``foreach_tiled``,
``foreach_unique``, ``in``, ``inline``, ``int8``, ``int16``, ``int32``,
``int64``, ``launch``, ``new``, ``print``, ``soa``, ``sync``, ``task``,
``true``, ``uniform``, and ``varying``.
Lexical Structure
@@ -1246,8 +1317,8 @@ Tokens in ``ispc`` are delimited by white-space and comments. The
white-space characters are the usual set of spaces, tabs, and carriage
returns/line feeds. Comments can be delineated with ``//``, which starts a
comment that continues to the end of the line, or the start of a comment
can be delineated with ``/*`` and the end with ``*/``. Like C/C++,
comments can't be nested.
can be delineated with ``/*`` at the start and with ``*/`` at the end.
Like C/C++, comments can't be nested.
Identifiers in ``ispc`` are sequences of characters that start with an
underscore or an upper-case or lower-case letter, and then followed by
@@ -1306,7 +1377,7 @@ optional plus or minus sign and then digits from 0 to 9. For example:
Floating-point constants can optionally have a "f" or "F" suffix (``ispc``
currently treats all floating-point constants as having 32-bit precision,
making this suffix unnecessary.)
making this suffix not currently have an effect.)
String constants in ``ispc`` are denoted by an opening double quote ``"``
followed by any character other than a newline, up to a closing double
@@ -1349,11 +1420,12 @@ The following identifiers are reserved as language keywords: ``bool``,
``break``, ``case``, ``cdo``, ``cfor``, ``char``, ``cif``, ``cwhile``,
``const``, ``continue``, ``default``, ``do``, ``double``, ``else``,
``enum``, ``export``, ``extern``, ``false``, ``float``, ``for``,
``foreach``, ``foreach_tiled``, ``goto``, ``if``, ``inline``, ``int``,
``int8``, ``int16``, ``int32``, ``int64``, ``launch``, ``NULL``, ``print``,
``return``, ``signed``, ``sizeof``, ``soa``, ``static``, ``struct``,
``switch``, ``sync``, ``task``, ``true``, ``typedef``, ``uniform``,
``union``, ``unsigned``, ``varying``, ``void``, ``volatile``, ``while``.
``foreach``, ``foreach_active``, ``foreach_tiled``, ``foreach_unique``,
``goto``, ``if``, ``in``, ``inline``, ``int``, ``int8``, ``int16``,
``int32``, ``int64``, ``launch``, ``NULL``, ``print``, ``return``,
``signed``, ``sizeof``, ``soa``, ``static``, ``struct``, ``switch``,
``sync``, ``task``, ``true``, ``typedef``, ``uniform``, ``union``,
``unsigned``, ``varying``, ``void``, ``volatile``, ``while``.
``ispc`` defines the following operators and punctuation:
@@ -2668,17 +2740,12 @@ same as ``if``:
``cif`` provides a hint to the compiler that you expect that most of the
executing SPMD programs will all have the same result for the ``if``
condition. Furthermore, it guarantees that the code in the "true" and
"false" clauses of the ``if`` statement will never be executed with an "all
off" execution mask. (See the `Control Flow Within A Gang`_ section for
more details on why regular ``if`` statements may sometimes do this.)
condition.
Along similar lines, ``cfor``, ``cdo``, and ``cwhile`` check to see if all
program instances are running at the start of each loop iteration; if so,
they can run a specialized code path that has been optimized for the "all
on" execution mask case. It is already the case for the regular looping
constructs in ``ispc`` that a loop will never be executed with an "all off"
execution mask.
on" execution mask case.
Functions and Function Calls