Merge branch 'master' into nvptx
This commit is contained in:
@@ -361,7 +361,7 @@ the ``vout`` array before the next iteration of the ``foreach`` loop runs.
|
||||
|
||||
On Linux\* and Mac OS\*, the makefile in that directory compiles this program.
|
||||
For Windows\*, open the ``examples/examples.sln`` file in Microsoft Visual
|
||||
C++ 2010\* to build this (and the other) examples. In either case,
|
||||
C++ 2012\* to build this (and the other) examples. In either case,
|
||||
build it now! We'll walk through the details of the compilation steps in
|
||||
the following section, `Using The ISPC Compiler`_.) In addition to
|
||||
compiling the ``ispc`` program, in this case the ``ispc`` compiler also
|
||||
@@ -662,14 +662,14 @@ To compile for Xeon Phi™, first generate intermediate C++ code:
|
||||
The ``ispc`` distribution now includes a header file,
|
||||
``examples/intrinsics/knc.h``, which maps from the generic C++ output
|
||||
to the corresponding intrinsic operations supported by Intel Xeon Phi™.
|
||||
Thus, to generate an object file, use the Intel C Compiler (``icc``) compile
|
||||
Thus, to generate an object file, use the Intel C++ Compiler (``icpc``) compile
|
||||
the C++ code generated by ``ispc``, setting the ``#include`` search
|
||||
path so that it can find the ``examples/intrinsics/knc.h`` header file
|
||||
in the ``ispc`` distribution.
|
||||
|
||||
::
|
||||
|
||||
icc -mmic -Iexamples/intrinsics/ foo.cpp -o foo.o
|
||||
icpc -mmic -Iexamples/intrinsics/ foo.cpp -o foo.o
|
||||
|
||||
With the current beta implementation, complex ``ispc`` programs are able to
|
||||
run on Xeon Phi™, though there are a number of known limitations:
|
||||
@@ -690,14 +690,14 @@ run on Xeon Phi™, though there are a number of known limitations:
|
||||
where the memory address is actually aligned. This may unnecessarily
|
||||
impact performance.
|
||||
|
||||
* When requesting that ICC generate code with strict floating point
|
||||
precision compliance (using ICC option ``-fp-model strict``) or
|
||||
accurate reporting of floating point exceptions (using ICC option
|
||||
* When requesting that ICPC generate code with strict floating point
|
||||
precision compliance (using ICPC option ``-fp-model strict``) or
|
||||
accurate reporting of floating point exceptions (using ICPC option
|
||||
``-fp-model except``) the compiler will generate code that uses the
|
||||
x87 unit rather than Xeon Phi™'s vector unit. For similar reasons, the
|
||||
options ``–ansi`` and ``–fmath-errno`` may result in calls to math
|
||||
functions that are implemented in x87 rather than using vector instructions.
|
||||
This will have a significant performance impact. See the ICC manual for
|
||||
This will have a significant performance impact. See the ICPC manual for
|
||||
details on these compiler options.
|
||||
|
||||
All of these issues are currently actively being addressed and will be
|
||||
@@ -3434,7 +3434,7 @@ for this argument.
|
||||
* ``fast``: more efficient but lower accuracy versions of the default ``ispc``
|
||||
implementations.
|
||||
* ``svml``: use Intel "Short Vector Math Library". Use
|
||||
``icc`` to link your final executable so that the appropriate libraries
|
||||
``icpc`` to link your final executable so that the appropriate libraries
|
||||
are linked.
|
||||
* ``system``: use the system's math library. On many systems, these
|
||||
functions are more accurate than both of ``ispc``'s implementations.
|
||||
@@ -3622,6 +3622,39 @@ normalized exponent as a power of two in the ``pw2`` parameter.
|
||||
uniform int * uniform pw2)
|
||||
|
||||
|
||||
Saturating Arithmetic
|
||||
---------------------
|
||||
A saturation (no overflow possible) addition, substraction, multiplication and
|
||||
division of all integer types are provided by the ``ispc`` standard library.
|
||||
|
||||
::
|
||||
|
||||
int8 saturating_add(uniform int8 a, uniform int8 b)
|
||||
int8 saturating_add(varying int8 a, varying int8 b)
|
||||
unsigned int8 saturating_add(uniform unsigned int8 a, uniform unsigned int8 b)
|
||||
unsigned int8 saturating_add(varying unsigned int8 a, varying unsigned int8 b)
|
||||
|
||||
int8 saturating_sub(uniform int8 a, uniform int8 b)
|
||||
int8 saturating_sub(varying int8 a, varying int8 b)
|
||||
unsigned int8 saturating_sub(uniform unsigned int8 a, uniform unsigned int8 b)
|
||||
unsigned int8 saturating_sub(varying unsigned int8 a, varying unsigned int8 b)
|
||||
|
||||
int8 saturating_mul(uniform int8 a, uniform int8 b)
|
||||
int8 saturating_mul(varying int8 a, varying int8 b)
|
||||
unsigned int8 saturating_mul(uniform unsigned int8 a, uniform unsigned int8 b)
|
||||
unsigned int8 saturating_mul(varying unsigned int8 a, varying unsigned int8 b)
|
||||
|
||||
int8 saturating_div(uniform int8 a, uniform int8 b)
|
||||
int8 saturating_div(varying int8 a, varying int8 b)
|
||||
unsigned int8 saturating_div(uniform unsigned int8 a, uniform unsigned int8 b)
|
||||
unsigned int8 saturating_div(varying unsigned int8 a,varying unsigned int8 b)
|
||||
|
||||
|
||||
In addition to the ``int8`` variants of saturating arithmetic functions listed
|
||||
above, there are versions that supports ``int16``, ``int32`` and ``int64``
|
||||
values as well.
|
||||
|
||||
|
||||
Pseudo-Random Numbers
|
||||
---------------------
|
||||
|
||||
@@ -4045,7 +4078,9 @@ overlap.
|
||||
void memmove(void * varying dst, void * varying src, int32 count)
|
||||
|
||||
Note that there are variants of these functions that take both ``uniform``
|
||||
and ``varying`` pointers.
|
||||
and ``varying`` pointers. Also note that ``sizeof(float)`` and
|
||||
``sizeof(uniform float)`` return different values, so programmers should
|
||||
take care when calculating ``count``.
|
||||
|
||||
To initialize values in memory, the ``memset`` routine can be used. (It
|
||||
also behaves like the function of the same name in the C Standard Library.)
|
||||
@@ -4955,7 +4990,7 @@ countries.
|
||||
|
||||
* Other names and brands may be claimed as the property of others.
|
||||
|
||||
Copyright(C) 2011-2013, Intel Corporation. All rights reserved.
|
||||
Copyright(C) 2011-2014, Intel Corporation. All rights reserved.
|
||||
|
||||
|
||||
Optimization Notice
|
||||
|
||||
Reference in New Issue
Block a user