Commit Graph

70 Commits

Author SHA1 Message Date
Matt Pharr
e3341176c5 Redo makefiles for the examples.
They're all based off a common examples/common.mk file, so that individual
makefiles are quite simple now.

The common.mk file also provides targets to build the examples using C++
output with the generic-16h or sse4.h files.  These targets don't run by
default, but do run if 'make all' is run.
2012-01-04 12:59:03 -08:00
Matt Pharr
8938e14442 Add support for emitting ~generic vectorized C++ code.
The compiler now supports an --emit-c++ option, which generates generic
vector C++ code.  To actually compile this code, the user must provide
C++ code that implements a variety of types and operations (e.g. adding
two floating-point vector values together, comparing them, etc).

There are two examples of this required code in examples/intrinsics:
generic-16.h is a "generic" 16-wide implementation that does all required
with scalar math; it's useful for demonstrating the requirements of the
implementation.  Then, sse4.h shows a simple implementation of a SSE4
target that maps the emitted function calls to SSE intrinsics.

When using these example implementations with the ispc test suite,
all but one or two tests pass with gcc and clang on Linux and OSX.
There are currently ~10 failures with icc on Linux, and ~50 failures with
MSVC 2010.  (To be fixed in coming days.)

Performance varies: when running the examples through the sse4.h
target, some have the same performance as when compiled with --target=sse4
from ispc directly (options), while noise is 12% slower, rt is 26%
slower, and aobench is 2.2x slower.  The details of this haven't yet been
carefully investigated, but will be in coming days as well.

Issue #92.
2012-01-04 12:59:03 -08:00
Matt Pharr
1a81173c93 Fix examples/options Makefile to use -O3 for serial builds.
Amazingly, it has been using just -g since the initial commit. :-(
2012-01-03 19:53:45 -08:00
Matt Pharr
20536bb339 Fix mandelbrot_tasks example 2011-12-11 15:21:11 -08:00
Matt Pharr
034507a35b Update examples: bulk task launch in stencil/mandelbrot, use foreach more. 2011-12-10 11:11:30 -08:00
Matt Pharr
0b2febcec0 Update volume rendering workload: use AVX, remove reduce_equal() path.
Both of these changes gave a performance benefit!
2011-12-09 17:40:50 -08:00
Matt Pharr
9805b0742d Switch to avx-x2 for the stencil workload 2011-12-08 14:36:09 -08:00
Matt Pharr
f19c2aba40 Windows build fixes for examples, update options task granularity 2011-12-05 14:23:50 -08:00
Matt Pharr
ffc1d97df7 Fix aobench_instrumented build on Windows 2011-12-05 13:33:29 -08:00
Matt Pharr
9dd498718b Updated options pricing example to have a tasking-based path as well. 2011-12-05 13:24:34 -08:00
Matt Pharr
c3b55de1ad Fix volume rendering example for command-line args change 2011-12-03 09:30:10 -08:00
Matt Pharr
24ef9dac8f Use foreach in the deferred shading example 2011-12-01 17:00:30 -08:00
Matt Pharr
82aa6efd12 Checkpoint user's guide edits 2011-12-01 13:38:17 -08:00
Matt Pharr
8bc7367109 Add foreach and foreach_tiled looping constructs
These make it easier to iterate over arbitrary amounts of data
elements; specifically, they automatically handle the "ragged
extra bits" that come up when the number of elements to be
processed isn't evenly divided by programCount.

TODO: documentation
2011-11-30 13:17:31 -08:00
Matt Pharr
11547cb950 stdlib updates to take advantage of pointers
The packed_{load,store}_active now functions take a pointer to a
location at which to start loading/storing, rather than an array
base and a uniform index.

Variants of the prefetch functions that take varying pointers 
are now available.

There are now variants of the various atomic functions that take
varying pointers (issue #112).
2011-11-29 15:41:38 -08:00
Matt Pharr
975db80ef6 Add support for pointers to the language.
Pointers can be either uniform or varying, and behave correspondingly.
e.g.: "uniform float * varying" is a varying pointer to uniform float
data in memory, and "float * uniform" is a uniform pointer to varying
data in memory.  Like other types, pointers are varying by default.

Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic,
and the array/pointer duality all bahave as in C.  Array arguments
to functions are converted to pointers, also like C.

There is a built-in NULL for a null pointer value; conversion from
compile-time constant 0 values to NULL still needs to be implemented.

Other changes:
- Syntax for references has been updated to be C++ style; a useful
  warning is now issued if the "reference" keyword is used.
- It is now illegal to pass a varying lvalue as a reference parameter
  to a function; references are essentially uniform pointers.
  This case had previously been handled via special case call by value
  return code.  That path has been removed, now that varying pointers
  are available to handle this use case (and much more).
- Some stdlib routines have been updated to take pointers as
  arguments where appropriate (e.g. prefetch and the atomics).
  A number of others still need attention.
- All of the examples have been updated
- Many new tests

TODO: documentation
2011-11-27 13:09:59 -08:00
Matt Pharr
ce7355f9ed Windows: fix examples build to look for ispc.exe in ../.. as well 2011-10-09 07:40:18 -07:00
Matt Pharr
bedaec2295 Update examples for multi-target compilation.
Makefile and vcxproj file updates.
Also modified vcxproj files so that the various files ispc generates go into $(TargetDir),
  not the current directory.
Modified the ray tracer example to not have uniform short-vector types in its app-visible
  datatypes (these are laid out differently on SSE vs AVX); there was an existing lurking
  bug in the way this was done before.
2011-10-04 16:01:56 -07:00
Matt Pharr
880cbb18cc Remove checks to see if system's processor matches the target the code was compiled for.
(Preparation for multi-target output.)
2011-10-04 16:01:55 -07:00
Matt Pharr
9b7f55a28e Add buildall.bat script for Windows. Also various example build fixes for Windows 2011-10-04 11:42:04 -07:00
Matt Pharr
e4d224a0f1 Use __cilk to detect Cilk support 2011-10-04 11:16:42 -07:00
Matt Pharr
0933a77c1b Improve task decomposition in ray tracing example.
Specifically, launch all of the tasks in one statement, rather than
still looping over spans in y and launching a collection of tasks
across x for each span.  This seems to give a few percent better
performance.
2011-10-04 09:33:59 -07:00
Matt Pharr
5f78edf07a Fix bug with screen decomposition in volume rendering example 2011-10-04 09:30:02 -07:00
Matt Pharr
0b02f94988 Task system performance tweaks.
Switch back to GCD on OSX.
Increase TaskInfo allocation count.
This fixes the regression with deferred on AVX (from 17x to 25x
  again with 4 cores.)
2011-10-01 08:04:09 -07:00
Matt Pharr
65c50b60fc Cleanups to deferred shading workload 2011-09-30 20:35:42 -07:00
Matt Pharr
f8f25a11b6 Added deferred shading workload 2011-09-30 19:42:14 -07:00
Matt Pharr
cb7976bbf6 Added updated task launch implementation that now tracks task groups.
Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)

Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API.  (The updated API is also documented in
the ispc user's guide.)

As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.

This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
2011-09-30 11:20:53 -07:00
Matt Pharr
8f3e46f67e Use InterlockedExchangeAdd on Windows 2011-09-29 16:19:59 -07:00
Matt Pharr
d45c536c47 Fix Windows debug build of simple example 2011-09-28 14:11:32 -07:00
Matt Pharr
9052d4b10b Linux build fixes 2011-09-17 13:42:46 -07:00
Matt Pharr
2405dae8e6 Use malloc() to get space for task arguments when compiling to AVX.
This is to work around the LLVM bug/limitation discused in LLVM bug
10841 (http://llvm.org/bugs/show_bug.cgi?id=10841).
2011-09-17 13:38:51 -07:00
Matt Pharr
a501ab1aa6 Fix parenthesization bugs in cost estimates.
Also added the debugging print that helped find these issues.
Revert inlining some functions in examples
2011-09-16 19:07:07 -07:00
Matt Pharr
cdc850f98c Inline some functions in examples 2011-09-16 17:02:21 -07:00
Matt Pharr
30f9dcd4f5 Unroll loops by default, add --opt=disable-loop-unroll to disable.
Issue #78.
2011-09-13 15:37:18 -07:00
Matt Pharr
0c344b6755 Fix Linux build of mandelbrot_tasks example 2011-09-13 15:17:30 -07:00
Matt Pharr
5dedb6f836 Add --scale command line argument to mandelbrot and rt examples.
This applies a floating-point scale factor to the image resolution;
it's useful for experiments with many-core systems where the 
base image resolution may not give enough work for good load-balancing
with tasks.
2011-09-07 20:07:51 -07:00
Matt Pharr
2ea6d249d5 Fix mapping to 8, 16 program instances in AO bench example.
With this, we now compute a correct image with AVX.
2011-09-07 11:34:24 -07:00
Matt Pharr
375f1cb8e8 Make octaves and octaves loop uniform in noise example 2011-09-07 10:34:23 -07:00
Matt Pharr
effe901890 Add task-parallel version of aobench 2011-09-07 05:43:21 -07:00
Matt Pharr
b5bfa43e92 Fix error with float suffixes 2011-09-02 13:09:25 -07:00
Matt Pharr
99221f7d17 Fix a few places in examples where C reference implementaion had a double-precision
fp constant undesirably causing computation to be done in double precision.

Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster.
Makes scalar version of noise about 15% faster.
Others are unchanged.
2011-09-01 16:31:22 -07:00
Matt Pharr
a94cabc692 Modify stencil example to do separate runs with and without task parallelism. 2011-08-30 05:08:21 -07:00
Matt Pharr
33feeffe5d Update timing header so it works with C code 2011-08-29 11:23:43 -07:00
Matt Pharr
74c2c8ae07 Linux build fixes 2011-08-17 07:08:44 -07:00
Matt Pharr
206c851146 Various improvements to example task systems in examples/.
- Only have a single copy of all of the tasks_*.cpp sample implementations,
  stored in examples/.
- Reduce dynamic storage allocation and locking in task launch code paths.
- Don't have a hard limit of the number of tasks that can be launched on
  Windows (fix issue #85).
2011-08-17 14:31:45 +01:00
Matt Pharr
60bdf1ef8a Modify rt example to also do a set of runs with tasks + SPMD together. 2011-08-17 13:14:32 +01:00
Matt Pharr
d7662b3eb9 Use reduce_equal() in volume rendering example to avoid some gathers.
Modified this example to use reduce_equal() to see if all of the program
instances want to load the 8 sample values around the same voxel.  When
this is the case, we can just do 8 scalar loads, rather than needing to
do a fully general gather.  Once this check fails, it isn't done again,
since it's not likely to start succeeding in the future.  This gives
a ~10% speedup with the low-res data set, and basically no performance
difference with the high-res one.  (It makes sense that the lower-resolution
the voxel sampling, the longer all of the rays will stay in the same set
of voxels.)
2011-08-17 12:37:07 +01:00
Matt Pharr
ecaa57c7c6 Add volume rendering example. (~2.3x speedup from SIMD vs serial code.) 2011-08-17 12:05:37 +01:00
Matt Pharr
fce183c244 Merge branch 'master' of github.com:ispc/ispc 2011-08-17 10:32:49 +01:00
Matt Pharr
7a92f8b3f9 Add MSVC build support for stencil example 2011-08-17 02:28:49 -07:00