Merge branch 'master' into nvptx

This commit is contained in:
Evghenii
2014-03-19 10:53:07 +01:00
93 changed files with 1182 additions and 1536 deletions

View File

@@ -361,7 +361,7 @@ the ``vout`` array before the next iteration of the ``foreach`` loop runs.
On Linux\* and Mac OS\*, the makefile in that directory compiles this program.
For Windows\*, open the ``examples/examples.sln`` file in Microsoft Visual
C++ 2010\* to build this (and the other) examples. In either case,
C++ 2012\* to build this (and the other) examples. In either case,
build it now! We'll walk through the details of the compilation steps in
the following section, `Using The ISPC Compiler`_.) In addition to
compiling the ``ispc`` program, in this case the ``ispc`` compiler also
@@ -662,14 +662,14 @@ To compile for Xeon Phi™, first generate intermediate C++ code:
The ``ispc`` distribution now includes a header file,
``examples/intrinsics/knc.h``, which maps from the generic C++ output
to the corresponding intrinsic operations supported by Intel Xeon Phi™.
Thus, to generate an object file, use the Intel C Compiler (``icc``) compile
Thus, to generate an object file, use the Intel C++ Compiler (``icpc``) compile
the C++ code generated by ``ispc``, setting the ``#include`` search
path so that it can find the ``examples/intrinsics/knc.h`` header file
in the ``ispc`` distribution.
::
icc -mmic -Iexamples/intrinsics/ foo.cpp -o foo.o
icpc -mmic -Iexamples/intrinsics/ foo.cpp -o foo.o
With the current beta implementation, complex ``ispc`` programs are able to
run on Xeon Phi™, though there are a number of known limitations:
@@ -690,14 +690,14 @@ run on Xeon Phi™, though there are a number of known limitations:
where the memory address is actually aligned. This may unnecessarily
impact performance.
* When requesting that ICC generate code with strict floating point
precision compliance (using ICC option ``-fp-model strict``) or
accurate reporting of floating point exceptions (using ICC option
* When requesting that ICPC generate code with strict floating point
precision compliance (using ICPC option ``-fp-model strict``) or
accurate reporting of floating point exceptions (using ICPC option
``-fp-model except``) the compiler will generate code that uses the
x87 unit rather than Xeon Phi™'s vector unit. For similar reasons, the
options ``ansi`` and ``fmath-errno`` may result in calls to math
functions that are implemented in x87 rather than using vector instructions.
This will have a significant performance impact. See the ICC manual for
This will have a significant performance impact. See the ICPC manual for
details on these compiler options.
All of these issues are currently actively being addressed and will be
@@ -3434,7 +3434,7 @@ for this argument.
* ``fast``: more efficient but lower accuracy versions of the default ``ispc``
implementations.
* ``svml``: use Intel "Short Vector Math Library". Use
``icc`` to link your final executable so that the appropriate libraries
``icpc`` to link your final executable so that the appropriate libraries
are linked.
* ``system``: use the system's math library. On many systems, these
functions are more accurate than both of ``ispc``'s implementations.
@@ -3622,6 +3622,39 @@ normalized exponent as a power of two in the ``pw2`` parameter.
uniform int * uniform pw2)
Saturating Arithmetic
---------------------
A saturation (no overflow possible) addition, substraction, multiplication and
division of all integer types are provided by the ``ispc`` standard library.
::
int8 saturating_add(uniform int8 a, uniform int8 b)
int8 saturating_add(varying int8 a, varying int8 b)
unsigned int8 saturating_add(uniform unsigned int8 a, uniform unsigned int8 b)
unsigned int8 saturating_add(varying unsigned int8 a, varying unsigned int8 b)
int8 saturating_sub(uniform int8 a, uniform int8 b)
int8 saturating_sub(varying int8 a, varying int8 b)
unsigned int8 saturating_sub(uniform unsigned int8 a, uniform unsigned int8 b)
unsigned int8 saturating_sub(varying unsigned int8 a, varying unsigned int8 b)
int8 saturating_mul(uniform int8 a, uniform int8 b)
int8 saturating_mul(varying int8 a, varying int8 b)
unsigned int8 saturating_mul(uniform unsigned int8 a, uniform unsigned int8 b)
unsigned int8 saturating_mul(varying unsigned int8 a, varying unsigned int8 b)
int8 saturating_div(uniform int8 a, uniform int8 b)
int8 saturating_div(varying int8 a, varying int8 b)
unsigned int8 saturating_div(uniform unsigned int8 a, uniform unsigned int8 b)
unsigned int8 saturating_div(varying unsigned int8 a,varying unsigned int8 b)
In addition to the ``int8`` variants of saturating arithmetic functions listed
above, there are versions that supports ``int16``, ``int32`` and ``int64``
values as well.
Pseudo-Random Numbers
---------------------
@@ -4045,7 +4078,9 @@ overlap.
void memmove(void * varying dst, void * varying src, int32 count)
Note that there are variants of these functions that take both ``uniform``
and ``varying`` pointers.
and ``varying`` pointers. Also note that ``sizeof(float)`` and
``sizeof(uniform float)`` return different values, so programmers should
take care when calculating ``count``.
To initialize values in memory, the ``memset`` routine can be used. (It
also behaves like the function of the same name in the C Standard Library.)
@@ -4955,7 +4990,7 @@ countries.
* Other names and brands may be claimed as the property of others.
Copyright(C) 2011-2013, Intel Corporation. All rights reserved.
Copyright(C) 2011-2014, Intel Corporation. All rights reserved.
Optimization Notice