=== v1.0.6 === (17 August 2011) Some additional cross-program instance operations have been added to the standard library. reduce_equal() checks to see if the given value is the same across all running program instances, and exclusive_scan_{and,or,and}() computes a scan over the given value in the running program instances. See the documentation of these new routines for more information: http://ispc.github.com/ispc.html#cross-program-instance-operations. The simple task system implementations used in the examples have been improved. The Windows version no nlonger has a hard limit on the number of tasks that can be launched, and all versions have less dynamic memory allocation and less locking. More of the examples now have paths that also measure performance using tasks along with SPMD vectorization. Two new examples have been added: one that shows the implementation of a ray-marching volume rendering algorithm, and one that shows a 3D stencil computation, as might be done for PDE solutions. Standard library routines to issue prefetches have been added. See the documentation for more details: http://ispc.github.com/ispc.html#prefetches. Fast versions of the float to half-precision float conversion routines have been added. For more details, see: http://ispc.github.com/ispc.html#conversions-to-and-from-half-precision-floats. There is the usual set of small bug fixes. Notably, a number of details related to handling 32 versus 64 bit targets have been fixed, which in turn has fixed a bug related to tasks having incorrect values for pointers passed to them. === v1.0.5 === (1 August 2011) Multi-element vector swizzles are supported; for example, given a 3-wide vector "foo", then expressions like "foo.zyx" and "foo.yz" can be used to construct other short vectors. See http://ispc.github.com/ispc.html#short-vector-types for more details. (Thanks to Pete Couperus for implementing this code!). int8 and int16 datatypes are now supported. It is still generally more efficient to use int32 for intermediate computations, even if the in-memory format is int8 or int16. There are now standard library routines to convert to and from 'half'-format floating-point values (half_to_float() and float_to_half()). There is a new example with an implementation of Perlin's Noise function (examples/noise). It shows a speedup of approximately 4.2x versus a C implementation on OSX and a 2.9x speedup versus C on Windows. === v1.0.4 === (18 July 2011) enums are now supported in ispc; see the section on enumeration types in the documentation (http://ispc.github.com/ispc.html#enumeration-types) for more informaiton. bools are converted to integers with zero extension, not sign extension as before (i.e. a 'true' bool converts to the value one, not 'all bits on'.) For cases where sign extension is still desired, there is a sign_extend(bool) function in the standard library. Support for 64-bit types in the standard library is much more complete than before. 64-bit integer constants are now supported by the parser. Storage for parameters to tasks is now allocated dynamically on Windows, rather than on the stack; with this fix, all tests now run correctly on Windows. There is now support for atomic swap and compare/exchange with float and double types. A number of additional small bugs have been fixed and a number of cases where the compiler would crash given a malformed program have been fixed. === v1.0.3 === (4 July 2011) ispc now has a bulit-in pre-processor (from LLVM's clang compiler). (Thanks to Pete Couperus for this patch!) It is therefore no longer necessary to use cl.exe for preprocessing on Windows; the MSVC proejct files for the examples have been updated accordingly. There is another variant of the shuffle() function int the standard library: " shuffle( v0, v1, int permute)", where the permutation vector indexes over the concatenation of the two vectors (e.g. the value 0 corresponds to the first element of v0, the value 2*programCount-1 corresponds to the last element of v1, etc.) ispc now supports the usual range of atomic operations (add, subtract, min, max, and, or, and xor) as well as atomic swap and atomic compare and exchange. There is also a facility for inserting memory fences. See the "Atomic Operations and Memory Fences" section of the user's guide (http://ispc.github.com/ispc.html#atomic-operations-and-memory-fences) for more information. There are now both 'signed' and 'unsigned' variants of the standard library functions like packed_load_active() that take references to arrays of signed int32s and unsigned int32s respectively. (The {load_from,store_to}_{int8,int16}() functions have similarly been augmented to have both 'signed' and 'unsigned' variants.) In initializer expressions with variable declarations, it is no longer legal to initialize arrays and structs with single scalar values that then initialize their members; they now must be initialized with initializer lists in braces (or initialized after of the initializer with a loop over array elements, etc.) === v1.0.2 === (1 July 2011) Floating-point hexidecimal constants are now parsed correctly on Windows (fixes issue #16). SSE2 is now the default target if --cpu=atom is given in the command line arguments and another target isn't explicitly specified. The standard library now provides broadcast(), rotate(), and shuffle() routines for efficient communication between program instances. The MSVC solution files to build the examples on Windows now use /fpmath:fast when building. === v1.0.1 === (24 June 2011) ispc no longer requires that pointers to memory that are passed in to ispc have alignment equal to the targets vector width; now alignment just has to be the regular element alignment (e.g. 4 bytes for floats, etc.) This change also fixed a number of cases where it previously incorrectly generated aligned load/store instructions in cases where the address wasn't actually aligned (even if the base address passed into ispc code was). === v1.0 === (21 June 2011) Initial Release