aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	640918bcc0	Call fclose() in deferred example. (Andy Zhang).	2012-03-07 08:50:10 -08:00
Matt Pharr	0115eeabfe	Update deferred example to take advantage of new pointer variability rules.	2012-02-29 14:27:53 -08:00
Matt Pharr	f81acbfe80	Implement unbound varibility for struct types. Now, if a struct member has an explicit 'uniform' or 'varying' qualifier, then that member has that variability, regardless of the variability of the struct's variability. Members without 'uniform' or 'varying' have unbound variability, and in turn inherit the variability of the struct. As a result of this, now structs can properly be 'varying' by default, just like all the other types, while still having sensible semantics.	2012-02-21 10:28:31 -08:00
Matt Pharr	56ec939692	Add perfbench to examples.sln for Windows	2012-02-14 10:07:08 -08:00
Matt Pharr	fe2d9aa600	Add perfbench to examples: a few small microbenchmarks.	2012-02-10 12:27:13 -08:00
Matt Pharr	83c8650b36	Add support for "local" atomics. Also updated aobench example to use them, which in turn allows using foreach() and thence a much cleaner implementation. Issue #58.	2012-02-03 13:15:21 -08:00
Matt Pharr	ea027a95a8	Fix various places in deferred shading example that assumed programCount >= 4. This gets deferred closer to working with the scalar target, but there are still some issues. (Partially in gamma correction / final clamping, it seems.) This fix causes a ~0.5% performance degradation with e.g. the AVX target, though it's not clear that it's worth having a separate code path in order to not lose this small amount of perf. (Partially addresses issue #167)	2012-01-31 11:46:33 -08:00
Matt Pharr	950f86200b	Fix examples/tasksys.cpp to compile with 32-bit targets. (Change a cmpxchgd to cmpxchl.) Note that a number of the examples still don't work with 32-bit compilation, why still TBD.	2012-01-30 15:03:54 -08:00
Matt Pharr	0575b1f38d	Update run_tests and examples makefile for scalar target. Fixed a number of tests that didn't handle the programCount == 1 case correctly.	2012-01-29 16:22:25 -08:00
Matt Pharr	c96fef6bc8	Fix silly error in generic-16.h example C++ bindings.	2012-01-27 17:04:57 -08:00
Matt Pharr	bba02f87ea	Improve implementations of unsigned <=, >= in sse4 intrinsics file.	2012-01-27 16:49:41 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	68f6ea8def	For << and >> with C++, detect when all instances are shifting by the same amount. In this case, we now emit calls to potentially-specialized functions for the left/right shifts that take a single integer value for the shift amount. These in turn can be matched to the corresponding intrinsics for the SSE target. Issue #145.	2012-01-19 10:04:32 -07:00
Matt Pharr	d14a2de168	Fix generic code emission when building with LLVM3.0/2.9. Specifically, don't use vector select for masked store blend there, but emit a call to a undefined __masked_store_blend_*() functions. Added implementations of these functions to the sse4.h and generic-16.h in examples/instrinsics. (Calls to these will never be generated with LLVM 3.1).	2012-01-17 23:42:22 -07:00
Matt Pharr	3bf3ac7922	Be more conservative about using blending in place of masked store. More specifically, we do a proper masked store (rather than a load- blend-store) unless we can determine that we're accessing a stack-allocated "varying" variable. This fixes a number of nefarious bugs where given code like: uniform float a[21]; foreach (i = 0 … 21) a[i] = 0; We'd use a blend and in turn read past the end of a[] in the last iteration. Also made slight changes to inlining in aobench; this keeps compiles to ~5s, versus ~45s without them (with this change). Fixes issue #160.	2012-01-17 23:42:22 -07:00
Matt Pharr	c6d1cebad4	Update masked_load/store implementations for generic targets to take void *s (Fixes compile errors when we try to actually use these!)	2012-01-17 23:42:22 -07:00
Matt Pharr	08189ce08c	Update "inline" qualifiers in a few examples.	2012-01-17 23:42:22 -07:00
Matt Pharr	5b4dbc8167	Fix build of aobench_instrumented example on OSX/Linux	2012-01-08 10:02:43 -08:00
Matt Pharr	78c6d3c02f	Add initial support for 'goto' statements. ispc now supports goto, but only under uniform control flow--i.e. it must be possible for the compiler to statically determine that all program instances will follow the goto. An error is issued at compile time if a goto is used when this is not the case.	2012-01-05 12:22:36 -08:00
Matt Pharr	e3341176c5	Redo makefiles for the examples. They're all based off a common examples/common.mk file, so that individual makefiles are quite simple now. The common.mk file also provides targets to build the examples using C++ output with the generic-16h or sse4.h files. These targets don't run by default, but do run if 'make all' is run.	2012-01-04 12:59:03 -08:00
Matt Pharr	8938e14442	Add support for emitting ~generic vectorized C++ code. The compiler now supports an --emit-c++ option, which generates generic vector C++ code. To actually compile this code, the user must provide C++ code that implements a variety of types and operations (e.g. adding two floating-point vector values together, comparing them, etc). There are two examples of this required code in examples/intrinsics: generic-16.h is a "generic" 16-wide implementation that does all required with scalar math; it's useful for demonstrating the requirements of the implementation. Then, sse4.h shows a simple implementation of a SSE4 target that maps the emitted function calls to SSE intrinsics. When using these example implementations with the ispc test suite, all but one or two tests pass with gcc and clang on Linux and OSX. There are currently ~10 failures with icc on Linux, and ~50 failures with MSVC 2010. (To be fixed in coming days.) Performance varies: when running the examples through the sse4.h target, some have the same performance as when compiled with --target=sse4 from ispc directly (options), while noise is 12% slower, rt is 26% slower, and aobench is 2.2x slower. The details of this haven't yet been carefully investigated, but will be in coming days as well. Issue #92.	2012-01-04 12:59:03 -08:00
Matt Pharr	1a81173c93	Fix examples/options Makefile to use -O3 for serial builds. Amazingly, it has been using just -g since the initial commit. :-(	2012-01-03 19:53:45 -08:00
Matt Pharr	20536bb339	Fix mandelbrot_tasks example	2011-12-11 15:21:11 -08:00
Matt Pharr	034507a35b	Update examples: bulk task launch in stencil/mandelbrot, use foreach more.	2011-12-10 11:11:30 -08:00
Matt Pharr	0b2febcec0	Update volume rendering workload: use AVX, remove reduce_equal() path. Both of these changes gave a performance benefit!	2011-12-09 17:40:50 -08:00
Matt Pharr	9805b0742d	Switch to avx-x2 for the stencil workload	2011-12-08 14:36:09 -08:00
Matt Pharr	f19c2aba40	Windows build fixes for examples, update options task granularity	2011-12-05 14:23:50 -08:00
Matt Pharr	ffc1d97df7	Fix aobench_instrumented build on Windows	2011-12-05 13:33:29 -08:00
Matt Pharr	9dd498718b	Updated options pricing example to have a tasking-based path as well.	2011-12-05 13:24:34 -08:00
Matt Pharr	c3b55de1ad	Fix volume rendering example for command-line args change	2011-12-03 09:30:10 -08:00
Matt Pharr	24ef9dac8f	Use foreach in the deferred shading example	2011-12-01 17:00:30 -08:00
Matt Pharr	82aa6efd12	Checkpoint user's guide edits	2011-12-01 13:38:17 -08:00
Matt Pharr	8bc7367109	Add foreach and foreach_tiled looping constructs These make it easier to iterate over arbitrary amounts of data elements; specifically, they automatically handle the "ragged extra bits" that come up when the number of elements to be processed isn't evenly divided by programCount. TODO: documentation	2011-11-30 13:17:31 -08:00
Matt Pharr	11547cb950	stdlib updates to take advantage of pointers The packed_{load,store}_active now functions take a pointer to a location at which to start loading/storing, rather than an array base and a uniform index. Variants of the prefetch functions that take varying pointers are now available. There are now variants of the various atomic functions that take varying pointers (issue #112).	2011-11-29 15:41:38 -08:00
Matt Pharr	975db80ef6	Add support for pointers to the language. Pointers can be either uniform or varying, and behave correspondingly. e.g.: "uniform float * varying" is a varying pointer to uniform float data in memory, and "float * uniform" is a uniform pointer to varying data in memory. Like other types, pointers are varying by default. Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic, and the array/pointer duality all bahave as in C. Array arguments to functions are converted to pointers, also like C. There is a built-in NULL for a null pointer value; conversion from compile-time constant 0 values to NULL still needs to be implemented. Other changes: - Syntax for references has been updated to be C++ style; a useful warning is now issued if the "reference" keyword is used. - It is now illegal to pass a varying lvalue as a reference parameter to a function; references are essentially uniform pointers. This case had previously been handled via special case call by value return code. That path has been removed, now that varying pointers are available to handle this use case (and much more). - Some stdlib routines have been updated to take pointers as arguments where appropriate (e.g. prefetch and the atomics). A number of others still need attention. - All of the examples have been updated - Many new tests TODO: documentation	2011-11-27 13:09:59 -08:00
Matt Pharr	ce7355f9ed	Windows: fix examples build to look for ispc.exe in ../.. as well	2011-10-09 07:40:18 -07:00
Matt Pharr	bedaec2295	Update examples for multi-target compilation. Makefile and vcxproj file updates. Also modified vcxproj files so that the various files ispc generates go into $(TargetDir), not the current directory. Modified the ray tracer example to not have uniform short-vector types in its app-visible datatypes (these are laid out differently on SSE vs AVX); there was an existing lurking bug in the way this was done before.	2011-10-04 16:01:56 -07:00
Matt Pharr	880cbb18cc	Remove checks to see if system's processor matches the target the code was compiled for. (Preparation for multi-target output.)	2011-10-04 16:01:55 -07:00
Matt Pharr	9b7f55a28e	Add buildall.bat script for Windows. Also various example build fixes for Windows	2011-10-04 11:42:04 -07:00
Matt Pharr	e4d224a0f1	Use __cilk to detect Cilk support	2011-10-04 11:16:42 -07:00
Matt Pharr	0933a77c1b	Improve task decomposition in ray tracing example. Specifically, launch all of the tasks in one statement, rather than still looping over spans in y and launching a collection of tasks across x for each span. This seems to give a few percent better performance.	2011-10-04 09:33:59 -07:00
Matt Pharr	5f78edf07a	Fix bug with screen decomposition in volume rendering example	2011-10-04 09:30:02 -07:00
Matt Pharr	0b02f94988	Task system performance tweaks. Switch back to GCD on OSX. Increase TaskInfo allocation count. This fixes the regression with deferred on AVX (from 17x to 25x again with 4 cores.)	2011-10-01 08:04:09 -07:00
Matt Pharr	65c50b60fc	Cleanups to deferred shading workload	2011-09-30 20:35:42 -07:00
Matt Pharr	f8f25a11b6	Added deferred shading workload	2011-09-30 19:42:14 -07:00
Matt Pharr	cb7976bbf6	Added updated task launch implementation that now tracks task groups. Within each function that launches tasks, we now can easily track which tasks that function launched, so that the sync at the end of the function can just sync on the tasks launched by that function (not all tasks launched by all functions.) Implementing this led to a rework of the task system API that ispc generates code to call; the example task systems in examples/tasksys.cpp have been updated to conform to this API. (The updated API is also documented in the ispc user's guide.) As part of this, "launch[n]" syntax was added to launch a number of tasks in a single launch statement, rather than requiring a loop over 'n' to launch n tasks. This commit thus fixes issue #84 (enhancement to launch multiple tasks from a single launch statement) as well as issue #105 (recursive task launches were broken).	2011-09-30 11:20:53 -07:00
Matt Pharr	8f3e46f67e	Use InterlockedExchangeAdd on Windows	2011-09-29 16:19:59 -07:00
Matt Pharr	d45c536c47	Fix Windows debug build of simple example	2011-09-28 14:11:32 -07:00
Matt Pharr	9052d4b10b	Linux build fixes	2011-09-17 13:42:46 -07:00
Matt Pharr	2405dae8e6	Use malloc() to get space for task arguments when compiling to AVX. This is to work around the LLVM bug/limitation discused in LLVM bug 10841 (http://llvm.org/bugs/show_bug.cgi?id=10841).	2011-09-17 13:38:51 -07:00

... 3 4 5 6 7

339 Commits