diff --git a/docs/ispc.rst b/docs/ispc.rst index 4be80a18..0366c087 100644 --- a/docs/ispc.rst +++ b/docs/ispc.rst @@ -121,10 +121,14 @@ Contents: * `The ISPC Standard Library`_ + + `Basic Operations On Data`_ + + * `Logical and Selection Operations`_ + * `Bit Operations`_ + + `Math Functions`_ * `Basic Math Functions`_ - * `Bit-Level Operations`_ * `Transcendental Functions`_ * `Pseudo-Random Numbers`_ @@ -2150,6 +2154,12 @@ greater than or equal to ``NUM_ITEMS``. // ... } +Short-circuiting may impose some overhead in the generated code; for cases +where short-circuiting is undesirable due to performance impact, see +the section `Logical and Selection Operations`_, which introduces helper +functions in the standard library that provide these operations without +short-circuiting. + Dynamic Memory Allocation ------------------------- @@ -2827,6 +2837,123 @@ The ISPC Standard Library compiling ``ispc`` programs. (To disable the standard library, pass the ``--nostdlib`` command-line flag to the compiler.) +Basic Operations On Data +------------------------ + +Logical and Selection Operations +-------------------------------- + +Recall from `Expressions`_ that ``ispc`` short-circuits the evaluation of +logical and selection operators: given an expression like ``(index < count +&& array[index] == 0)``, then ``array[index] == 0`` is only evaluated if +``index < count`` is true. This property is useful for writing expressions +like the preceeding one, where the second expression may not be safe to +evaluate in some cases. + +This short-circuiting can impose overhead in the generated code; additional +operations are required to test the first value and to conditionally jump +over the code that evaluates the second value. The ``ispc`` compiler does +try to mitigate this cost by detecting cases where it is both safe and +inexpensive to evaluate both expressions, and skips short-circuiting in the +generated code in this case (without there being any programmer-visible +change in program behavior.) + +For cases where the compiler can't detect this case but the programmer +wants to avoid short-circuiting behavior, the standard library provides a +few helper functions. First, ``and()`` and ``or()`` provide +non-short-circuiting logical AND and OR operations. + +:: + + bool and(bool a, bool b) + bool or(bool a, bool b) + uniform bool and(uniform bool a, uniform bool b) + uniform bool or(uniform bool a, uniform bool b) + +And there are three variants of ``select()`` that select between two values +based on a boolean condition. These are the variants of ``select()`` for +the ``int8`` type: + +:: + + int8 select(bool v, int8 a, int8 b) + int8 select(uniform bool v, int8 a, int8 b) + uniform int8 select(uniform bool v, uniform int8 a, uniform int8 b) + +There are also variants for ``int16``, ``int32``, ``int64``, ``float``, and +``double`` types. + +Bit Operations +-------------- + +The various variants of ``popcnt()`` return the population count--the +number of bits set in the given value. + +:: + + uniform int popcnt(uniform int v) + int popcnt(int v) + uniform int popcnt(bool v) + + +A few functions determine how many leading bits in the given value are zero +and how many of the trailing bits are zero; there are also ``unsigned`` +variants of these functions and variants that take ``int64`` and ``unsigned +int64`` types. + +:: + + int32 count_leading_zeros(int32 v) + uniform int32 count_leading_zeros(uniform int32 v) + int32 count_trailing_zeros(int32 v) + uniform int32 count_trailing_zeros(uniform int32 v) + +Sometimes it's useful to convert a ``bool`` value to an integer using sign +extension so that the integer's bits are all on if the ``bool`` has the +value ``true`` (rather than just having the value one). The +``sign_extend()`` functions provide this functionality: + +:: + + int sign_extend(bool value) + uniform int sign_extend(uniform bool value) + +The ``intbits()`` and ``floatbits()`` functions can be used to implement +low-level floating-point bit twiddling. For example, ``intbits()`` returns +an ``unsigned int`` that is a bit-for-bit copy of the given ``float`` +value. (Note: it is **not** the same as ``(int)a``, but corresponds to +something like ``*((int *)&a)`` in C. + +:: + + float floatbits(unsigned int a); + uniform float floatbits(uniform unsigned int a); + unsigned int intbits(float a); + uniform unsigned int intbits(uniform float a); + + +The ``intbits()`` and ``floatbits()`` functions have no cost at runtime; +they just let the compiler know how to interpret the bits of the given +value. They make it possible to efficiently write functions that take +advantage of the low-level bit representation of floating-point values. + +For example, the ``abs()`` function in the standard library is implemented +as follows: + +:: + + float abs(float a) { + unsigned int i = intbits(a); + i &= 0x7fffffff; + return floatbits(i); + } + +This code directly clears the high order bit to ensure that the given +floating-point value is positive. This compiles down to a single ``andps`` +instruction when used with an IntelĀ® SSE target, for example. + + + Math Functions -------------- @@ -2919,77 +3046,6 @@ quite efficient.) uniform unsigned int low, uniform unsigned int high) -Bit-Level Operations --------------------- - - -The various variants of ``popcnt()`` return the population count--the -number of bits set in the given value. - -:: - - uniform int popcnt(uniform int v) - int popcnt(int v) - uniform int popcnt(bool v) - - -A few functions determine how many leading bits in the given value are zero -and how many of the trailing bits are zero; there are also ``unsigned`` -variants of these functions and variants that take ``int64`` and ``unsigned -int64`` types. - -:: - - int32 count_leading_zeros(int32 v) - uniform int32 count_leading_zeros(uniform int32 v) - int32 count_trailing_zeros(int32 v) - uniform int32 count_trailing_zeros(uniform int32 v) - -Sometimes it's useful to convert a ``bool`` value to an integer using sign -extension so that the integer's bits are all on if the ``bool`` has the -value ``true`` (rather than just having the value one). The -``sign_extend()`` functions provide this functionality: - -:: - - int sign_extend(bool value) - uniform int sign_extend(uniform bool value) - -The ``intbits()`` and ``floatbits()`` functions can be used to implement -low-level floating-point bit twiddling. For example, ``intbits()`` returns -an ``unsigned int`` that is a bit-for-bit copy of the given ``float`` -value. (Note: it is **not** the same as ``(int)a``, but corresponds to -something like ``*((int *)&a)`` in C. - -:: - - float floatbits(unsigned int a); - uniform float floatbits(uniform unsigned int a); - unsigned int intbits(float a); - uniform unsigned int intbits(uniform float a); - - -The ``intbits()`` and ``floatbits()`` functions have no cost at runtime; -they just let the compiler know how to interpret the bits of the given -value. They make it possible to efficiently write functions that take -advantage of the low-level bit representation of floating-point values. - -For example, the ``abs()`` function in the standard library is implemented -as follows: - -:: - - float abs(float a) { - unsigned int i = intbits(a); - i &= 0x7fffffff; - return floatbits(i); - } - -This code directly clears the high order bit to ensure that the given -floating-point value is positive. This compiles down to a single ``andps`` -instruction when used with an IntelĀ® SSE target, for example. - - Transcendental Functions ------------------------ diff --git a/stdlib.ispc b/stdlib.ispc index 89c47cf7..14d6f8cf 100644 --- a/stdlib.ispc +++ b/stdlib.ispc @@ -746,6 +746,125 @@ static inline void prefetch_nt(const void * varying ptr) { } } +/////////////////////////////////////////////////////////////////////////// +// non-short-circuiting alternatives + +__declspec(safe,cost1) +static inline bool and(bool a, bool b) { + return a && b; +} + +__declspec(safe,cost1) +static inline uniform bool and(uniform bool a, uniform bool b) { + return a && b; +} + +__declspec(safe,cost1) +static inline bool or(bool a, bool b) { + return a || b; +} + +__declspec(safe,cost1) +static inline uniform bool or(uniform bool a, uniform bool b) { + return a || b; +} + +__declspec(safe,cost1) +static inline int8 select(bool c, int8 a, int8 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline int8 select(uniform bool c, int8 a, int8 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline uniform int8 select(uniform bool c, uniform int8 a, + uniform int8 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline int16 select(bool c, int16 a, int16 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline int16 select(uniform bool c, int16 a, int16 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline uniform int16 select(uniform bool c, uniform int16 a, + uniform int16 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline int32 select(bool c, int32 a, int32 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline int32 select(uniform bool c, int32 a, int32 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline uniform int32 select(uniform bool c, uniform int32 a, + uniform int32 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline int64 select(bool c, int64 a, int64 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline int64 select(uniform bool c, int64 a, int64 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline uniform int64 select(uniform bool c, uniform int64 a, + uniform int64 b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline float select(bool c, float a, float b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline float select(uniform bool c, float a, float b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline uniform float select(uniform bool c, uniform float a, + uniform float b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline double select(bool c, double a, double b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline double select(uniform bool c, double a, double b) { + return c ? a : b; +} + +__declspec(safe,cost1) +static inline uniform double select(uniform bool c, uniform double a, + uniform double b) { + return c ? a : b; +} + /////////////////////////////////////////////////////////////////////////// // Horizontal ops / reductions