These make it easier to iterate over arbitrary amounts of data
elements; specifically, they automatically handle the "ragged
extra bits" that come up when the number of elements to be
processed isn't evenly divided by programCount.
TODO: documentation
Makefile and vcxproj file updates.
Also modified vcxproj files so that the various files ispc generates go into $(TargetDir),
not the current directory.
Modified the ray tracer example to not have uniform short-vector types in its app-visible
datatypes (these are laid out differently on SSE vs AVX); there was an existing lurking
bug in the way this was done before.
This applies a floating-point scale factor to the image resolution;
it's useful for experiments with many-core systems where the
base image resolution may not give enough work for good load-balancing
with tasks.
- Only have a single copy of all of the tasks_*.cpp sample implementations,
stored in examples/.
- Reduce dynamic storage allocation and locking in task launch code paths.
- Don't have a hard limit of the number of tasks that can be launched on
Windows (fix issue #85).
- In the ispc-generated header files, a #define now indicates which compilation target
was used.
- The examples use utility routines from the new file examples/cpuid.h to check the
system's CPU's capabilities to see if it supports the ISA that was used for
compiling the example code and print error messages if things aren't going to
work...
were expecting vector-width-aligned pointers where in point of fact,
there's no guarantee that they would have been in general.
Removed the aligned memory allocation routines from some of the examples;
they're no longer needed.
No perf. difference on Core2/Core i5 CPUs; older CPUs may see some
regressions.
Still need to update the documentation for this change and finish reviewing
alignment issues in Load/Store instructions generated by .cpp files.