added cuda examples

2013-11-04 11:44:49 +01:00
parent cb6614da42
commit cb7cbec0d5
226 changed files with 284385 additions and 0 deletions
--- a/examples_cuda/README.txt
+++ b/examples_cuda/README.txt
@@ -0,0 +1,167 @@
+====================
+ISPC Examples README
+====================
+
+This directory has a number of sample ispc programs.  Before building them
+(on an system), install the appropriate ispc compiler binary into a
+directory in your path.  Then, if you're running Windows, open the
+"examples.sln" file and built from there.  For building under Linux/OSX,
+there are makefiles in each directory that build the examples individually.
+
+Almost all of them benchmark ispc implementations of the given computation
+against regular serial C++ implementations, printing out a comparison of
+the runtimes and the speedup delivered by ispc.  It may be instructive to
+do a side-by-side diff of the C++ and ispc implementations of these
+algorithms to learn more about wirting ispc code.
+
+ 
+AOBench
+=======
+
+This is an ISPC implementation of the "AO bench" benchmark
+(http://syoyo.wordpress.com/2009/01/26/ao-bench-is-evolving/).  The command
+line arguments are:
+
+ao (num iterations) (x res) (yres)
+
+It executes the program for the given number of iterations, rendering an
+(xres x yres) image each time and measuring the computation time with both
+serial and ispc implementations.
+
+
+AOBench_Instrumented
+====================
+
+This version of AO Bench is compiled with the --instrument ispc compiler
+flag.  This causes the compiler to emit calls to a (user-supplied)
+ISPCInstrument() function at interesting places in the compiled code.  An
+example implementation of this function that counts the number of times the
+callback is made and records some statistics about control flow coherence
+is provided in the instrument.cpp file.
+
+
+Deferred
+========
+
+This example shows an extensive example of using ispc for efficient
+deferred shading of scenes with thousands of lights; it's an implementation
+of the algorithm that Johan Andersson described at SIGGRAPH 2009,
+implemented by Andrew Lauritzen and Jefferson Montgomery.  The basic idea
+is that a pre-rendered G-buffer is partitioned into tiles, and in each
+tile, the set of lights that contribute to the tile is first computed.
+Then, the pixels in the tile are then shaded using just those light
+sources. (See slides 19-29 of
+http://s09.idav.ucdavis.edu/talks/04-JAndersson-ParallelFrostbite-Siggraph09.pdf
+for more details on the algorithm.)
+
+This directory includes three implementations of the algorithm:
+
+- An ispc implementation that first does a static partitioning of the
+  screen into tiles to parallelize across the CPU cores.  Within each tile
+  ispc kernels provide highly efficient implementations of the light
+  culling and shading calculations.
+- A "best practices" serial C++ implementation.  This implementation does a
+  dynamic partitioning of the screen, refining tiles with significant Z
+  depth complexity (these tiles often have a large number of lights that
+  affect them).  Within each final tile, the pixels are shaded using
+  regular C++ code.
+- If the Cilk extensions are available in your compiler, an ispc
+  implementation that uses Cilk will also be built.
+  (See http://software.intel.com/en-us/articles/intel-cilk-plus/).  Like 
+  the "best practices" serial implementation, this version does dynamic
+  tile partitioning for better load balancing and then uses ispc for the
+  light culling and shading.
+
+
+GMRES
+=====
+
+An implementation of the generalized minimal residual method for solving
+sparse matrix equations.
+(http://en.wikipedia.org/wiki/Generalized_minimal_residual_method)
+
+
+Mandelbrot
+==========
+
+Mandelbrot set generation.  This example is extensively documented at the
+http://ispc.github.com/example.html page.
+
+
+Mandelbrot_tasks
+================
+
+Implementation of Mandelbrot set generation that also parallelizes across
+cores using tasks.  Under Windows, a simple task system built on
+Microsoft's Concurrency Runtime is used (see tasks_concrt.cpp).  On OSX, a
+task system based on Grand Central Dispatch is used (tasks_gcd.cpp), and on
+Linux, a pthreads-based task system is used (tasks_pthreads.cpp).  When
+using tasks with ispc, no task system is mandated; the user is free to plug
+in any task system they want, for ease of interoperating with existing task
+systems.
+
+
+Noise
+=====
+
+This example has an implementation of Ken Perlin's procedural "noise"
+function, as described in his 2002 "Improving Noise" SIGGRAPH paper.
+
+ 
+Options
+=======
+
+This program implements both the Black-Scholes and Binomial options pricing
+models in both ispc and regular serial C++ code.
+
+
+Perfbench
+=========
+
+This runs a number of microbenchmarks to measure system performance and
+code generation quality.
+
+
+RT
+==
+
+This is a simple ray tracer; it reads in camera parameters and a bounding
+volume hierarchy and renders the scene from the given viewpoint.  The
+command line arguments are:
+
+rt <scene name base>
+
+Where <scene base name> is one of "cornell", "teapot", or "sponza".
+
+The implementation originally derives from the bounding volume hierarchy
+and triangle intersection code from pbrt; see the pbrt source code and/or
+"Physically Based Rendering" book for more about the basic algorithmic
+details.
+
+
+Simple
+======
+
+This is a simple "hello world" type program that shows a ~10 line
+application program calling out to a ~5 line ispc program to do a simple
+computation.
+
+Sort
+====
+This is a bucket sort of 32 bit unsigned integers.
+By default 1000000 random elements get sorted.
+Call ./sort N in order to sort N elements instead.
+
+Volume
+======
+
+Ray-marching volume rendering, with single scattering lighting model.  To
+run it, specify a camera parameter file and a volume density file, e.g.:
+
+volume camera.dat density_highres.vol
+
+(See, e.g. Chapters 11 and 16 of "Physically Based Rendering" for
+information about the algorithm implemented here.)  The volume data set
+included here was generated by the example implementation of the "Wavelet
+Turbulence for Fluid Simulation" SIGGRAPH 2008 paper by Kim et
+al. (http://www.cs.cornell.edu/~tedkim/WTURB/)