ispc/docs/ReleaseNotes.txt

=== v1.1.2 === (9 January 2012)

The major new feature in this release is support for "generic" C++
vectorized output; in other words, ispc can emit C++ code that corresponds
to the vectorized computation that the ispc program represents.  See the
examples/intrinsics directory in the ispc distribution for two example
implementations of the set of functions that must be provided map the
vector calls generated by ispc to target specific functions.

ispc now has partial support for 'goto' statements; specifically, goto is
allowed if any enclosing control flow statements (if/for/while/do) have
'uniform' test expressions, but not if they have 'varying' tests.

A number of improvements have been made to the code generated for gathers
and scatters--one of them (better matching x86's "free" scale by 2/4/8 for
addressing calculations) improved the performance of the noise example by
14%.

Many small bugs have been fixed in this release as well, including issue
numbers 138, 129, 135, 127, 149, and 142.

=== v1.1.1 === (15 December 2011)

This release doesn't include any significant new functionality, but does
include a small improvements in generated code and a number of bug fixes.

The one user-visible language change is that integer constants may be
specified with 'u' and 'l' suffixes, like in C.  For example, "1024llu"
defines the constant with unsigned 64-bit type.

More informative and useful error messages are printed when function
overload resolution fails.

Masking is avoided in additional cases when the mask can be
statically-determined to be all on.

A number of small bugs have been fixed:
- Under some circumstances, incorrect masks were used when assigning a
  value to a reference and when doing gathers/scatters.
- Incorrect code could be generated in some cases when some instances
  returned part way through a function but others contineud executing.
- Type checking wasn't being performed for calls through function pointers;
  now an error is issued if the arguments don't match up, etc.
- Incorrect code was being generated for gather/scatter to structs that had
  elements with varying short-vector types.
- Typechecking wasn't being performed for "foreach" statements; this led to
  problems like function overload resolution not being performed if an
  overloaded function call was used to determine the iteration range..
- A number of symbols would be multiply-defined when compiling to multiple
  targets and using the sse2-x2 target as one of them (issue #131).

=== v1.1.0 === (5 December 2011)

This is a major new release of the compiler, with significant additions to
language functionality and capabilities.  It includes a number of small
language syntax changes that will require modification of existing
programs.  These changes should generally be straightforward and all are
steps toward eliminating parts of ispc syntax that are incompatible with
C/C++.  See
http://ispc.github.com/ispc.html#updating-ispc-programs-for-changes-in-ispc-1-1
for more information about these changes.

ispc now fully supports pointers, including pointer arithmetic, implicit
conversions of arrays to pointers, and all of the other capabilities of
pointers in C.  See http://ispc.github.com/ispc.html#pointer-types for more
information about pointers in ispc and
http://ispc.github.com/ispc.html#function-pointer-types for information
about function pointers in ispc.

Reference types are now declared with C++ syntax (e.g. "const float &foo").

ispc now supports 64-bit addressing.  For performance reasons, this
capability is disabled by default (even on 64-bit targets), but can be
enabled with a command-line flag:
http://ispc.github.com/ispc.html#selecting-32-or-64-bit-addressing.

This release features new parallel "foreach" statements, which make it
easier in many instances to map program instances to data for data-parallel
computation than the programIndex/programCount mechanism:
http://ispc.github.com/ispc.html#parallel-iteration-statements-foreach-and-foreach-tiled.

Finally, all of the system's documentation has been significantly revised.
The documentation of ispc's parallel execution model has been rewritten:
http://ispc.github.com/ispc.html#the-ispc-parallel-execution-model, and
there is now a more specific discussion of similarities and differences
between ispc and C/C++:
http://ispc.github.com/ispc.html#relationship-to-the-c-programming-language.
There is now a separate FAQ (http://ispc.github.com/faq.html), and a
Performance Guide (http://ispc.github.com/perfguide.html).

=== v1.0.12 === (20 October 2011)

This release includes a new "double-pumped" 8-wide target for SSE2,
"sse2-x2".  Like the sse4-x2 and avx-x2 targets, this target may deliver
higher performance for some workloads than the regular sse2 target.  (For
other workloads, it may be slower.)

The ispc language now includes an "assert()" statement.  See
http://ispc.github.com/ispc.html#assertions for more information.

The compiler now sets a preprocessor #define based on the target ISA; for
example, ISPC_TARGET_SSE4 is defined for the sse4 targets, and so forth.

The standard library now provides high-performance routines for converting
between some "array of structures" and "structure of arrays" formats.
See
http://ispc.github.com/ispc.html#converting-between-array-of-structures-and-structure-of-arrays-layout
for more information.

Inline functions now have static linkage.

A number of improvements have been made to the optimization passes that
detect when gathers and scatters can be transformed into vector stores and
loads, respectively.  In particular, these passes now handle variables that
are used as loop induction variables much better.

=== v1.0.11 === (6 October 2011)

The main new feature in this release is support for generating code for
multiple targets (e.g., SSE2, SSE4, and AVX) and having the compiled code
select the best variant at execution time.  For more information, see
http://ispc.github.com/ispc.html#compiling-with-support-for-multiple-instruction-sets.

All of the examples now take advantage of the support for multiple
compilation targets; thus, if one has an AVX system, it's not necessary to
recompile the examples to use the AVX target.

Performance of the built-in task system that is used in the examples has
been improved.

Finally, the print() statement now works on OSX; it had been broken for the
last few releases.

=== v1.0.10 === (30 September 2011)

This release features an extensive new example showing the application of
ispc to a deferred shading algorithm for scenes with thousands of lights
(examples/deferred).  This is an implementation of the algorithm that Johan
Andersson described at SIGGRAPH 2009 and was implemented by Andrew
Lauritzen and Jefferson Montgomery.  The basic idea is that a pre-rendered
G-buffer is partitioned into tiles, and in each tile, the set of lights
that contribute to the tile is computed.  Then, the pixels in the tile are
then shaded using those light sources. (See slides 19-29 of
http://s09.idav.ucdavis.edu/talks/04-JAndersson-ParallelFrostbite-Siggraph09.pdf
for more details on the algorithm.)

The mechanism for launching tasks from ispc code has been generalized to
allow multiple tasks to be launched with a single launch call (see
http://ispc.github.com/ispc.html#task-parallelism-language-syntax for more
information.)

A few new functions have been added to the standard library: num_cores()
returns the number of cores in the system's CPU, and variants of all of the
atomic operators that take 'uniform' values as parameters have been added.

=== v1.0.9 === (26 September 2011)

The binary release of v1.0.9 is the first that supports AVX code
generation.  Two targets are provided: "avx", which runs with a
programCount of 8, and "avx-x2" which runs 16 program instances
simultaneously.  (This binary is also built using the in-progress LLVM 3.0
development libraries, while previous ones have been built with the
released 2.9 version of LLVM.)

This release has no other significant changes beyond a number of small
bugfixes (https://github.com/ispc/ispc/issues/100,
https://github.com/ispc/ispc/issues/101, https://github.com/ispc/ispc/issues/103.)

=== v1.0.8 === (19 September 2011)

A number of improvements have been made to handling of 'if' statements in
the language:
  - A bug was fixed where invalid memory could be incorrectly accessed even
    if none of the running program instances wanted to execute the
    corresponding instructions (https://github.com/ispc/ispc/issues/74).
  - The code generated for 'if' statements is a bit simpler and thus more
    efficient.

There is now '--pic' command-line argument that causes position-independent
code to be generated (Linux and OSX only).

A number of additional performance improvements:
  - Loops are now unrolled by default; the --opt=disable-loop-unroll
    command-line argument can be used to disable this behavior.
    (https://github.com/ispc/ispc/issues/78)
  - A few more cases where gathers/scatters could be determined at compile
    time to actually access contiguous locations have been added.
    (https://github.com/ispc/ispc/issues/79)

Finally, warnings are now issued (if possible) when it can be determined
at compile-time that an out-of-bounds array index is being used.
(https://github.com/ispc/ispc/issues/98).


=== v1.0.7 === (3 September 2011)

The various atomic_*_global() standard library functions are generally
substantially more efficient.  They all previously issued one hardware
atomic instruction for each running program instance but now locally
compute a reduction over the operands and issue a single hardware atomic,
giving the same effect and results in the end (issue #57).

CPU/ISA target handling has been substantially improved.  If no CPU is
specified, the host CPU type is used, not just a default of "nehalem".  A
number of bugs were fixed that ensure that LLVM doesn't generate SSE>2
instructions when using the SSE2 target (fixes issue #82).

Shift rights of unsigned integer types use a logical shift right
instruction now, not an arithmetic shift right (fixed issue #88).

When emitting header files, 'extern' declarations of globals used in ispc
code are now outside of the ispc namespace.  Fixes issue #64.

The stencil example has been modified to do runs with and without
parallelism.

Many other small bugfixes and improvements.

=== v1.0.6 === (17 August 2011)

Some additional cross-program instance operations have been added to the
standard library.  reduce_equal() checks to see if the given value is the
same across all running program instances, and exclusive_scan_{and,or,and}()
computes a scan over the given value in the running program instances.
See the documentation of these new routines for more information:
http://ispc.github.com/ispc.html#cross-program-instance-operations.

The simple task system implementations used in the examples have been
improved.  The Windows version no nlonger has a hard limit on the number of
tasks that can be launched, and all versions have less dynamic memory
allocation and less locking.  More of the examples now have paths that also
measure performance using tasks along with SPMD vectorization.

Two new examples have been added: one that shows the implementation of a
ray-marching volume rendering algorithm, and one that shows a 3D stencil
computation, as might be done for PDE solutions.

Standard library routines to issue prefetches have been added.  See the
documentation for more details: http://ispc.github.com/ispc.html#prefetches.

Fast versions of the float to half-precision float conversion routines have
been added.  For more details, see:
http://ispc.github.com/ispc.html#conversions-to-and-from-half-precision-floats.

There is the usual set of small bug fixes.  Notably, a number of details
related to handling 32 versus 64 bit targets have been fixed, which in turn
has fixed a bug related to tasks having incorrect values for pointers
passed to them.

=== v1.0.5 === (1 August 2011)

Multi-element vector swizzles are supported; for example, given a 3-wide
vector "foo", then expressions like "foo.zyx" and "foo.yz" can be used to
construct other short vectors.  See
http://ispc.github.com/ispc.html#short-vector-types
for more details.  (Thanks to Pete Couperus for implementing this code!).

int8 and int16 datatypes are now supported.  It is still generally more
efficient to use int32 for intermediate computations, even if the in-memory
format is int8 or int16.

There are now standard library routines to convert to and from 'half'-format
floating-point values (half_to_float() and float_to_half()).

There is a new example with an implementation of Perlin's Noise function
(examples/noise).  It shows a speedup of approximately 4.2x versus a C
implementation on OSX and a 2.9x speedup versus C on Windows.

=== v1.0.4 === (18 July 2011)

enums are now supported in ispc; see the section on enumeration types in
the documentation (http://ispc.github.com/ispc.html#enumeration-types) for
more informaiton.

bools are converted to integers with zero extension, not sign extension as
before (i.e. a 'true' bool converts to the value one, not 'all bits on'.)
For cases where sign extension is still desired, there is a
sign_extend(bool) function in the standard library.

Support for 64-bit types in the standard library is much more complete than
before.

64-bit integer constants are now supported by the parser.

Storage for parameters to tasks is now allocated dynamically on Windows,
rather than on the stack; with this fix, all tests now run correctly on
Windows.

There is now support for atomic swap and compare/exchange with float and
double types.

A number of additional small bugs have been fixed and a number of cases
where the compiler would crash given a malformed program have been fixed.

=== v1.0.3 === (4 July 2011)

ispc now has a bulit-in pre-processor (from LLVM's clang compiler).
(Thanks to Pete Couperus for this patch!)  It is therefore no longer
necessary to use cl.exe for preprocessing on Windows; the MSVC proejct
files for the examples have been updated accordingly.

There is another variant of the shuffle() function int the standard
library: "<type> shuffle(<type> v0, <type> v1, int permute)", where the
permutation vector indexes over the concatenation of the two vectors
(e.g. the value 0 corresponds to the first element of v0, the value
2*programCount-1 corresponds to the last element of v1, etc.)

ispc now supports the usual range of atomic operations (add, subtract, min,
max, and, or, and xor) as well as atomic swap and atomic compare and
exchange.  There is also a facility for inserting memory fences.  See the
"Atomic Operations and Memory Fences" section of the user's guide
(http://ispc.github.com/ispc.html#atomic-operations-and-memory-fences) for
more information.

There are now both 'signed' and 'unsigned' variants of the standard library
functions like packed_load_active() that take references to arrays of
signed int32s and unsigned int32s respectively.  (The
{load_from,store_to}_{int8,int16}() functions have similarly been augmented
to have both 'signed' and 'unsigned' variants.)

In initializer expressions with variable declarations, it is no longer
legal to initialize arrays and structs with single scalar values that then
initialize their members; they now must be initialized with initializer
lists in braces (or initialized after of the initializer with a loop over
array elements, etc.)

=== v1.0.2 === (1 July 2011)

Floating-point hexidecimal constants are now parsed correctly on Windows
(fixes issue #16).

SSE2 is now the default target if --cpu=atom is given in the command line
arguments and another target isn't explicitly specified.

The standard library now provides broadcast(), rotate(), and shuffle()
routines for efficient communication between program instances.

The MSVC solution files to build the examples on Windows now use
/fpmath:fast when building.

=== v1.0.1 === (24 June 2011)

ispc no longer requires that pointers to memory that are passed in to ispc
have alignment equal to the targets vector width; now alignment just has to
be the regular element alignment (e.g. 4 bytes for floats, etc.)  This
change also fixed a number of cases where it previously incorrectly
generated aligned load/store instructions in cases where the address wasn't
actually aligned (even if the base address passed into ispc code was).

=== v1.0 === (21 June 2011)

Initial Release