Merge branch 'master' into nvptx

2014-03-19 10:53:07 +01:00
parent 335d36211c 792f04881c
commit 4641a15287
93 changed files with 1182 additions and 1536 deletions
--- a/docs/ispc.rst
+++ b/docs/ispc.rst
@@ -361,7 +361,7 @@ the ``vout`` array before the next iteration of the ``foreach`` loop runs.

 On Linux\* and Mac OS\*, the makefile in that directory compiles this program.
 For Windows\*, open the ``examples/examples.sln`` file in Microsoft Visual
-C++ 2010\* to build this (and the other) examples.  In either case,
+C++ 2012\* to build this (and the other) examples.  In either case,
 build it now!  We'll walk through the details of the compilation steps in
 the following section, `Using The ISPC Compiler`_.)  In addition to
 compiling the ``ispc`` program, in this case the ``ispc`` compiler also
@@ -662,14 +662,14 @@ To compile for Xeon Phi™, first generate intermediate C++ code:
 The ``ispc`` distribution now includes a header file,
 ``examples/intrinsics/knc.h``, which maps from the generic C++ output
 to the corresponding intrinsic operations supported by Intel Xeon Phi™.
-Thus, to generate an object file, use the Intel C Compiler (``icc``) compile
+Thus, to generate an object file, use the Intel C++ Compiler (``icpc``) compile
 the C++ code generated by ``ispc``, setting the ``#include`` search
 path so that it can find the ``examples/intrinsics/knc.h`` header file
 in the ``ispc`` distribution.

 ::

-  icc -mmic -Iexamples/intrinsics/ foo.cpp -o foo.o 
+  icpc -mmic -Iexamples/intrinsics/ foo.cpp -o foo.o 

 With the current beta implementation, complex ``ispc`` programs are able to
 run on Xeon Phi™, though there are a number of known limitations:
@@ -690,14 +690,14 @@ run on Xeon Phi™, though there are a number of known limitations:
  where the memory address is actually aligned.  This may unnecessarily
  impact performance.

-* When requesting that ICC generate code with strict floating point
-  precision compliance (using ICC option ``-fp-model strict``) or
-  accurate reporting of floating point exceptions (using ICC option
+* When requesting that ICPC generate code with strict floating point
+  precision compliance (using ICPC option ``-fp-model strict``) or
+  accurate reporting of floating point exceptions (using ICPC option
  ``-fp-model except``) the compiler will generate code that uses the
  x87 unit rather than Xeon Phi™'s vector unit. For similar reasons, the
  options ``–ansi`` and ``–fmath-errno`` may result in calls to math
  functions that are implemented in x87 rather than using vector instructions.
-  This will have a significant performance impact. See the ICC manual for
+  This will have a significant performance impact. See the ICPC manual for
  details on these compiler options.

 All of these issues are currently actively being addressed and will be
@@ -3434,7 +3434,7 @@ for this argument.
 * ``fast``: more efficient but lower accuracy versions of the default ``ispc``
  implementations.
 * ``svml``: use Intel "Short Vector Math Library".  Use
-  ``icc`` to link your final executable so that the appropriate libraries
+  ``icpc`` to link your final executable so that the appropriate libraries
  are linked.
 * ``system``: use the system's math library.  On many systems, these
  functions are more accurate than both of ``ispc``'s implementations.
@@ -3622,6 +3622,39 @@ normalized exponent as a power of two in the ``pw2`` parameter.
                        uniform int * uniform pw2)


+Saturating Arithmetic
+---------------------
+A saturation (no overflow possible) addition, substraction, multiplication and 
+division of all integer types are provided by the ``ispc`` standard library.
+
+::
+
+     int8 saturating_add(uniform int8 a, uniform int8 b)
+     int8 saturating_add(varying int8 a, varying int8 b)    
+     unsigned int8 saturating_add(uniform unsigned int8 a, uniform unsigned int8 b)
+     unsigned int8 saturating_add(varying unsigned int8 a, varying unsigned int8 b)
+
+     int8 saturating_sub(uniform int8 a, uniform int8 b)
+     int8 saturating_sub(varying int8 a, varying int8 b)    
+     unsigned int8 saturating_sub(uniform unsigned int8 a, uniform unsigned int8 b)
+     unsigned int8 saturating_sub(varying unsigned int8 a, varying unsigned int8 b)
+
+     int8 saturating_mul(uniform int8 a, uniform int8 b)
+     int8 saturating_mul(varying int8 a, varying int8 b)    
+     unsigned int8 saturating_mul(uniform unsigned int8 a, uniform unsigned int8 b)
+     unsigned int8 saturating_mul(varying unsigned int8 a, varying unsigned int8 b)
+
+     int8 saturating_div(uniform int8 a, uniform int8 b)
+     int8 saturating_div(varying int8 a, varying int8 b)    
+     unsigned int8 saturating_div(uniform unsigned int8 a, uniform unsigned int8 b)
+     unsigned int8 saturating_div(varying unsigned int8 a,varying unsigned int8 b)
+
+
+In addition to the ``int8`` variants of saturating arithmetic functions listed 
+above, there are versions that supports ``int16``, ``int32`` and ``int64`` 
+values as well.
+
+
 Pseudo-Random Numbers
 ---------------------

@@ -4045,7 +4078,9 @@ overlap.
    void memmove(void * varying dst, void * varying src, int32 count)

 Note that there are variants of these functions that take both ``uniform``
-and ``varying`` pointers.
+and ``varying`` pointers.  Also note that ``sizeof(float)`` and 
+``sizeof(uniform float)`` return different values, so programmers should
+take care when calculating ``count``.

 To initialize values in memory, the ``memset`` routine can be used.  (It
 also behaves like the function of the same name in the C Standard Library.)
@@ -4955,7 +4990,7 @@ countries.

 * Other names and brands may be claimed as the property of others.

-Copyright(C) 2011-2013, Intel Corporation. All rights reserved.
+Copyright(C) 2011-2014, Intel Corporation. All rights reserved.


 Optimization Notice