Commit Graph

54 Commits

Author SHA1 Message Date
Matt Pharr
ce7355f9ed Windows: fix examples build to look for ispc.exe in ../.. as well 2011-10-09 07:40:18 -07:00
Matt Pharr
bedaec2295 Update examples for multi-target compilation.
Makefile and vcxproj file updates.
Also modified vcxproj files so that the various files ispc generates go into $(TargetDir),
  not the current directory.
Modified the ray tracer example to not have uniform short-vector types in its app-visible
  datatypes (these are laid out differently on SSE vs AVX); there was an existing lurking
  bug in the way this was done before.
2011-10-04 16:01:56 -07:00
Matt Pharr
880cbb18cc Remove checks to see if system's processor matches the target the code was compiled for.
(Preparation for multi-target output.)
2011-10-04 16:01:55 -07:00
Matt Pharr
9b7f55a28e Add buildall.bat script for Windows. Also various example build fixes for Windows 2011-10-04 11:42:04 -07:00
Matt Pharr
e4d224a0f1 Use __cilk to detect Cilk support 2011-10-04 11:16:42 -07:00
Matt Pharr
0933a77c1b Improve task decomposition in ray tracing example.
Specifically, launch all of the tasks in one statement, rather than
still looping over spans in y and launching a collection of tasks
across x for each span.  This seems to give a few percent better
performance.
2011-10-04 09:33:59 -07:00
Matt Pharr
5f78edf07a Fix bug with screen decomposition in volume rendering example 2011-10-04 09:30:02 -07:00
Matt Pharr
0b02f94988 Task system performance tweaks.
Switch back to GCD on OSX.
Increase TaskInfo allocation count.
This fixes the regression with deferred on AVX (from 17x to 25x
  again with 4 cores.)
2011-10-01 08:04:09 -07:00
Matt Pharr
65c50b60fc Cleanups to deferred shading workload 2011-09-30 20:35:42 -07:00
Matt Pharr
f8f25a11b6 Added deferred shading workload 2011-09-30 19:42:14 -07:00
Matt Pharr
cb7976bbf6 Added updated task launch implementation that now tracks task groups.
Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)

Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API.  (The updated API is also documented in
the ispc user's guide.)

As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.

This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
2011-09-30 11:20:53 -07:00
Matt Pharr
8f3e46f67e Use InterlockedExchangeAdd on Windows 2011-09-29 16:19:59 -07:00
Matt Pharr
d45c536c47 Fix Windows debug build of simple example 2011-09-28 14:11:32 -07:00
Matt Pharr
9052d4b10b Linux build fixes 2011-09-17 13:42:46 -07:00
Matt Pharr
2405dae8e6 Use malloc() to get space for task arguments when compiling to AVX.
This is to work around the LLVM bug/limitation discused in LLVM bug
10841 (http://llvm.org/bugs/show_bug.cgi?id=10841).
2011-09-17 13:38:51 -07:00
Matt Pharr
a501ab1aa6 Fix parenthesization bugs in cost estimates.
Also added the debugging print that helped find these issues.
Revert inlining some functions in examples
2011-09-16 19:07:07 -07:00
Matt Pharr
cdc850f98c Inline some functions in examples 2011-09-16 17:02:21 -07:00
Matt Pharr
30f9dcd4f5 Unroll loops by default, add --opt=disable-loop-unroll to disable.
Issue #78.
2011-09-13 15:37:18 -07:00
Matt Pharr
0c344b6755 Fix Linux build of mandelbrot_tasks example 2011-09-13 15:17:30 -07:00
Matt Pharr
5dedb6f836 Add --scale command line argument to mandelbrot and rt examples.
This applies a floating-point scale factor to the image resolution;
it's useful for experiments with many-core systems where the 
base image resolution may not give enough work for good load-balancing
with tasks.
2011-09-07 20:07:51 -07:00
Matt Pharr
2ea6d249d5 Fix mapping to 8, 16 program instances in AO bench example.
With this, we now compute a correct image with AVX.
2011-09-07 11:34:24 -07:00
Matt Pharr
375f1cb8e8 Make octaves and octaves loop uniform in noise example 2011-09-07 10:34:23 -07:00
Matt Pharr
effe901890 Add task-parallel version of aobench 2011-09-07 05:43:21 -07:00
Matt Pharr
b5bfa43e92 Fix error with float suffixes 2011-09-02 13:09:25 -07:00
Matt Pharr
99221f7d17 Fix a few places in examples where C reference implementaion had a double-precision
fp constant undesirably causing computation to be done in double precision.

Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster.
Makes scalar version of noise about 15% faster.
Others are unchanged.
2011-09-01 16:31:22 -07:00
Matt Pharr
a94cabc692 Modify stencil example to do separate runs with and without task parallelism. 2011-08-30 05:08:21 -07:00
Matt Pharr
33feeffe5d Update timing header so it works with C code 2011-08-29 11:23:43 -07:00
Matt Pharr
74c2c8ae07 Linux build fixes 2011-08-17 07:08:44 -07:00
Matt Pharr
206c851146 Various improvements to example task systems in examples/.
- Only have a single copy of all of the tasks_*.cpp sample implementations,
  stored in examples/.
- Reduce dynamic storage allocation and locking in task launch code paths.
- Don't have a hard limit of the number of tasks that can be launched on
  Windows (fix issue #85).
2011-08-17 14:31:45 +01:00
Matt Pharr
60bdf1ef8a Modify rt example to also do a set of runs with tasks + SPMD together. 2011-08-17 13:14:32 +01:00
Matt Pharr
d7662b3eb9 Use reduce_equal() in volume rendering example to avoid some gathers.
Modified this example to use reduce_equal() to see if all of the program
instances want to load the 8 sample values around the same voxel.  When
this is the case, we can just do 8 scalar loads, rather than needing to
do a fully general gather.  Once this check fails, it isn't done again,
since it's not likely to start succeeding in the future.  This gives
a ~10% speedup with the low-res data set, and basically no performance
difference with the high-res one.  (It makes sense that the lower-resolution
the voxel sampling, the longer all of the rays will stay in the same set
of voxels.)
2011-08-17 12:37:07 +01:00
Matt Pharr
ecaa57c7c6 Add volume rendering example. (~2.3x speedup from SIMD vs serial code.) 2011-08-17 12:05:37 +01:00
Matt Pharr
fce183c244 Merge branch 'master' of github.com:ispc/ispc 2011-08-17 10:32:49 +01:00
Matt Pharr
7a92f8b3f9 Add MSVC build support for stencil example 2011-08-17 02:28:49 -07:00
Matt Pharr
96af08e789 Print notices about image files being written 2011-08-16 06:31:26 +01:00
Matt Pharr
c570108026 Fix linux build of stencil example 2011-08-15 04:44:17 -07:00
Matt Pharr
137ea7bde6 Rename semaphore filename to be more generic 2011-08-04 05:28:00 -07:00
Matt Pharr
e05b3981d9 Add stencil example 2011-08-03 13:49:02 -07:00
Matt Pharr
a4bb6b5520 Add new example with implementation of Perlin Noise
~4.2x speedup versus serial on OSX / gcc.
~2.9x speedup versus serial on Windows / MSVC.
2011-08-01 10:33:18 +01:00
Matt Pharr
80ca02af58 Add missing #include, fix Linux build. Fixes issue #75. 2011-07-27 10:51:13 +01:00
Matt Pharr
bba7211654 Add support for int8/int16 types. Addresses issues #9 and #42. 2011-07-21 06:57:40 +01:00
Matt Pharr
6b0a6c0124 Fix issue #67: don't crash ungracefully if target ISA not supported on system.
- In the ispc-generated header files, a #define now indicates which compilation target
  was used.
- The examples use utility routines from the new file examples/cpuid.h to check the
  system's CPU's capabilities to see if it supports the ISA that was used for
  compiling the example code and print error messages if things aren't going to
  work...
2011-07-18 12:29:43 +01:00
Matt Pharr
213c3a9666 Release notes, bump doxygen version # for next release.
Add more .gitignore stuff.
2011-07-17 16:52:36 +02:00
Matt Pharr
6e4c165c7e Use malloc to allocate storage for task parameters on Windows.
Fixes bug #55.  A number of tests were crashing on Windows due to the task
launch code using alloca to allocate space for the tasks' parameters.  On
Windows, the stack isn't generally big enough for this to be a good idea.
Also added an alignment parmaeter to ISPCMalloc() to pass the alignment
requirement along.
2011-07-06 05:53:25 -07:00
Matt Pharr
46ccc251c8 Added C preprocessor support for Windows.
Link the appropriate clang libraries to make the preprocessor
stuff work on Windows builds.  Also updated the solution files
for the examples to stop using cl.exe for preprocessing but to
just call ispc directly.  Finishes fixes for issue #32.
2011-07-04 05:01:04 -07:00
Matt Pharr
6ed6961958 Add checks to sample task systems to ensure that TasksInit has been
called; if not, print an informative error message.
2011-07-01 14:11:16 +01:00
Matt Pharr
bcae21dbca Update examples to use fpmath:fast and to enable intrinsics on Windows 2011-06-30 13:17:14 -07:00
Matt Pharr
ff76c2334e small doc fix, removed incorrect comment from example 2011-06-24 16:19:51 -07:00
Matt Pharr
865e430b56 Finished updating alignment issues for vector types; don't assume pointers
are aligned to the natural vector width.
2011-06-23 18:51:15 -07:00
Matt Pharr
990bee5a86 Merge branch 'master' of github.com:ispc/ispc 2011-06-23 18:21:02 -07:00