They're all based off a common examples/common.mk file, so that individual
makefiles are quite simple now.
The common.mk file also provides targets to build the examples using C++
output with the generic-16h or sse4.h files. These targets don't run by
default, but do run if 'make all' is run.
The compiler now supports an --emit-c++ option, which generates generic
vector C++ code. To actually compile this code, the user must provide
C++ code that implements a variety of types and operations (e.g. adding
two floating-point vector values together, comparing them, etc).
There are two examples of this required code in examples/intrinsics:
generic-16.h is a "generic" 16-wide implementation that does all required
with scalar math; it's useful for demonstrating the requirements of the
implementation. Then, sse4.h shows a simple implementation of a SSE4
target that maps the emitted function calls to SSE intrinsics.
When using these example implementations with the ispc test suite,
all but one or two tests pass with gcc and clang on Linux and OSX.
There are currently ~10 failures with icc on Linux, and ~50 failures with
MSVC 2010. (To be fixed in coming days.)
Performance varies: when running the examples through the sse4.h
target, some have the same performance as when compiled with --target=sse4
from ispc directly (options), while noise is 12% slower, rt is 26%
slower, and aobench is 2.2x slower. The details of this haven't yet been
carefully investigated, but will be in coming days as well.
Issue #92.
These make it easier to iterate over arbitrary amounts of data
elements; specifically, they automatically handle the "ragged
extra bits" that come up when the number of elements to be
processed isn't evenly divided by programCount.
TODO: documentation
The packed_{load,store}_active now functions take a pointer to a
location at which to start loading/storing, rather than an array
base and a uniform index.
Variants of the prefetch functions that take varying pointers
are now available.
There are now variants of the various atomic functions that take
varying pointers (issue #112).
Pointers can be either uniform or varying, and behave correspondingly.
e.g.: "uniform float * varying" is a varying pointer to uniform float
data in memory, and "float * uniform" is a uniform pointer to varying
data in memory. Like other types, pointers are varying by default.
Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic,
and the array/pointer duality all bahave as in C. Array arguments
to functions are converted to pointers, also like C.
There is a built-in NULL for a null pointer value; conversion from
compile-time constant 0 values to NULL still needs to be implemented.
Other changes:
- Syntax for references has been updated to be C++ style; a useful
warning is now issued if the "reference" keyword is used.
- It is now illegal to pass a varying lvalue as a reference parameter
to a function; references are essentially uniform pointers.
This case had previously been handled via special case call by value
return code. That path has been removed, now that varying pointers
are available to handle this use case (and much more).
- Some stdlib routines have been updated to take pointers as
arguments where appropriate (e.g. prefetch and the atomics).
A number of others still need attention.
- All of the examples have been updated
- Many new tests
TODO: documentation
Makefile and vcxproj file updates.
Also modified vcxproj files so that the various files ispc generates go into $(TargetDir),
not the current directory.
Modified the ray tracer example to not have uniform short-vector types in its app-visible
datatypes (these are laid out differently on SSE vs AVX); there was an existing lurking
bug in the way this was done before.
Specifically, launch all of the tasks in one statement, rather than
still looping over spans in y and launching a collection of tasks
across x for each span. This seems to give a few percent better
performance.
Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)
Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API. (The updated API is also documented in
the ispc user's guide.)
As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.
This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
This applies a floating-point scale factor to the image resolution;
it's useful for experiments with many-core systems where the
base image resolution may not give enough work for good load-balancing
with tasks.
fp constant undesirably causing computation to be done in double precision.
Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster.
Makes scalar version of noise about 15% faster.
Others are unchanged.
- Only have a single copy of all of the tasks_*.cpp sample implementations,
stored in examples/.
- Reduce dynamic storage allocation and locking in task launch code paths.
- Don't have a hard limit of the number of tasks that can be launched on
Windows (fix issue #85).
Modified this example to use reduce_equal() to see if all of the program
instances want to load the 8 sample values around the same voxel. When
this is the case, we can just do 8 scalar loads, rather than needing to
do a fully general gather. Once this check fails, it isn't done again,
since it's not likely to start succeeding in the future. This gives
a ~10% speedup with the low-res data set, and basically no performance
difference with the high-res one. (It makes sense that the lower-resolution
the voxel sampling, the longer all of the rays will stay in the same set
of voxels.)