aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	54459255d4	Add unmasked { } statement. This reestablishes an "all on" execution mask for the gang, which can be useful for nested parallelism..	2012-06-22 14:30:58 -07:00
Matt Pharr	27e39954d6	Fix a number of issues in examples/intrinsics/sse4.h. This had gotten fairly out of date, after recent changes to C++ output. Roughly 15 tests still fail with this target. Issue #278.	2012-06-08 12:52:36 -07:00
Matt Pharr	89a2566e01	Add separate variants of memory built-ins for floats and doubles. Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have separate float/double variants of each of those.	2012-06-07 14:47:16 -07:00
Matt Pharr	1ac3e03171	Gather/scatter function improvements in builtins. More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.	2012-06-07 14:19:23 -07:00
Matt Pharr	b86d40091a	Improve naming of masked load/store instructions in builtins. Now, use _i32 suffixes, rather than _32, etc. Also cleaned up the m4 macro to generate these functions, using WIDTH to get the target width, etc.	2012-06-07 13:58:31 -07:00
Matt Pharr	8fd9b84a80	Update seed_rng() in stdlib to take a varying seed. Previously, we were trying to take a uniform seed and then shuffle that around to initialize the state for each of the program instances. This was becoming increasingly untenable and brittle. Now a varying seed is expected and used.	2012-05-30 10:35:41 -07:00
Matt Pharr	5084712a15	Fix bugs in examples/intrinsics/generic-64.h There were a number of situations where we were left-shifting 1 by a lane index that were failing due to shifting beyond 32-bits. Fixed by shifting the 64-bit constant value 1ull.	2012-05-29 08:31:10 -07:00
Matt Pharr	21c43737fe	Fix bug in examples/intrinsics/generic-32.h	2012-05-25 14:27:30 -07:00
Matt Pharr	6c7bcf00e7	Add examples/intrinsics/generic-64.h.	2012-05-25 14:27:19 -07:00
Matt Pharr	7a2142075c	Add examples/intrinsics/generic-32.h implementation. Roughly 100 tests fail with this; all the tests need to be audited for assumptions that 16 is the widest width possible…	2012-05-25 12:37:59 -07:00
Matt Pharr	90db01d038	Represent MOVMSK'ed masks with int64s rather than int32s. This allows us to scale up to 64-wide execution.	2012-05-25 11:57:23 -07:00
Matt Pharr	fd03ba7586	Export reference parameters as C++ references, not pointers.	2012-05-24 07:12:48 -07:00
Matt Pharr	f4df2fb176	Improvements to mask update code for generic targets. Rather than XOR'ing with a temporary 'all-on' vector, we call __not. Also, we call out to __and_not1 and __and_not2, for an AND where the first or second operand, respectively, has had NOT applied to it.	2012-05-16 13:52:51 -07:00
Matt Pharr	c6241581a0	Add an extra parameter to __smear functions to encode return type. Now, the __smear* functions in generated C++ code have an unused first parameter of the desired return type; this allows us to have headers that include variants of __smear for multiple target widths. (This approach is necessary since we can't overload by return type in C++.) Issue #256.	2012-05-08 09:54:23 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
John Poole	cd98a29a4b	Fix 32-bit samples on Mac OS X. On Mac OS X and Linux rdtsc() didn't save and restore 32-bit registers. This patch fixes issue #87.	2012-04-23 16:00:07 -07:00
Matt Pharr	12c754c92b	Improved handling of splatted constant vectors in C++ backend. Now, when we're printing out a constant vector value, we check to see if it's a splat and call out to one of the __splat_* functions in the generated code if to.	2012-04-19 13:11:15 -07:00
Matt Pharr	10c5ba140c	Much more efficient half_to_float() code, via @rygorous. Also, switch deferred shading example to use it. (Rather than the "fast" half to float that doesn't handle deforms, etc.)	2012-03-21 16:13:04 -07:00
Matt Pharr	ddfe4932ac	Fix parsing of 'launch' so that angle brackets can be removed. Issue #6.	2012-03-19 11:27:32 -07:00
Matt Pharr	640918bcc0	Call fclose() in deferred example. (Andy Zhang).	2012-03-07 08:50:10 -08:00
Matt Pharr	0115eeabfe	Update deferred example to take advantage of new pointer variability rules.	2012-02-29 14:27:53 -08:00
Matt Pharr	f81acbfe80	Implement unbound varibility for struct types. Now, if a struct member has an explicit 'uniform' or 'varying' qualifier, then that member has that variability, regardless of the variability of the struct's variability. Members without 'uniform' or 'varying' have unbound variability, and in turn inherit the variability of the struct. As a result of this, now structs can properly be 'varying' by default, just like all the other types, while still having sensible semantics.	2012-02-21 10:28:31 -08:00
Matt Pharr	56ec939692	Add perfbench to examples.sln for Windows	2012-02-14 10:07:08 -08:00
Matt Pharr	fe2d9aa600	Add perfbench to examples: a few small microbenchmarks.	2012-02-10 12:27:13 -08:00
Matt Pharr	83c8650b36	Add support for "local" atomics. Also updated aobench example to use them, which in turn allows using foreach() and thence a much cleaner implementation. Issue #58.	2012-02-03 13:15:21 -08:00
Matt Pharr	ea027a95a8	Fix various places in deferred shading example that assumed programCount >= 4. This gets deferred closer to working with the scalar target, but there are still some issues. (Partially in gamma correction / final clamping, it seems.) This fix causes a ~0.5% performance degradation with e.g. the AVX target, though it's not clear that it's worth having a separate code path in order to not lose this small amount of perf. (Partially addresses issue #167)	2012-01-31 11:46:33 -08:00
Matt Pharr	950f86200b	Fix examples/tasksys.cpp to compile with 32-bit targets. (Change a cmpxchgd to cmpxchl.) Note that a number of the examples still don't work with 32-bit compilation, why still TBD.	2012-01-30 15:03:54 -08:00
Matt Pharr	0575b1f38d	Update run_tests and examples makefile for scalar target. Fixed a number of tests that didn't handle the programCount == 1 case correctly.	2012-01-29 16:22:25 -08:00
Matt Pharr	c96fef6bc8	Fix silly error in generic-16.h example C++ bindings.	2012-01-27 17:04:57 -08:00
Matt Pharr	bba02f87ea	Improve implementations of unsigned <=, >= in sse4 intrinsics file.	2012-01-27 16:49:41 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	68f6ea8def	For << and >> with C++, detect when all instances are shifting by the same amount. In this case, we now emit calls to potentially-specialized functions for the left/right shifts that take a single integer value for the shift amount. These in turn can be matched to the corresponding intrinsics for the SSE target. Issue #145.	2012-01-19 10:04:32 -07:00
Matt Pharr	d14a2de168	Fix generic code emission when building with LLVM3.0/2.9. Specifically, don't use vector select for masked store blend there, but emit a call to a undefined __masked_store_blend_*() functions. Added implementations of these functions to the sse4.h and generic-16.h in examples/instrinsics. (Calls to these will never be generated with LLVM 3.1).	2012-01-17 23:42:22 -07:00
Matt Pharr	3bf3ac7922	Be more conservative about using blending in place of masked store. More specifically, we do a proper masked store (rather than a load- blend-store) unless we can determine that we're accessing a stack-allocated "varying" variable. This fixes a number of nefarious bugs where given code like: uniform float a[21]; foreach (i = 0 … 21) a[i] = 0; We'd use a blend and in turn read past the end of a[] in the last iteration. Also made slight changes to inlining in aobench; this keeps compiles to ~5s, versus ~45s without them (with this change). Fixes issue #160.	2012-01-17 23:42:22 -07:00
Matt Pharr	c6d1cebad4	Update masked_load/store implementations for generic targets to take void *s (Fixes compile errors when we try to actually use these!)	2012-01-17 23:42:22 -07:00
Matt Pharr	08189ce08c	Update "inline" qualifiers in a few examples.	2012-01-17 23:42:22 -07:00
Matt Pharr	5b4dbc8167	Fix build of aobench_instrumented example on OSX/Linux	2012-01-08 10:02:43 -08:00
Matt Pharr	78c6d3c02f	Add initial support for 'goto' statements. ispc now supports goto, but only under uniform control flow--i.e. it must be possible for the compiler to statically determine that all program instances will follow the goto. An error is issued at compile time if a goto is used when this is not the case.	2012-01-05 12:22:36 -08:00
Matt Pharr	e3341176c5	Redo makefiles for the examples. They're all based off a common examples/common.mk file, so that individual makefiles are quite simple now. The common.mk file also provides targets to build the examples using C++ output with the generic-16h or sse4.h files. These targets don't run by default, but do run if 'make all' is run.	2012-01-04 12:59:03 -08:00
Matt Pharr	8938e14442	Add support for emitting ~generic vectorized C++ code. The compiler now supports an --emit-c++ option, which generates generic vector C++ code. To actually compile this code, the user must provide C++ code that implements a variety of types and operations (e.g. adding two floating-point vector values together, comparing them, etc). There are two examples of this required code in examples/intrinsics: generic-16.h is a "generic" 16-wide implementation that does all required with scalar math; it's useful for demonstrating the requirements of the implementation. Then, sse4.h shows a simple implementation of a SSE4 target that maps the emitted function calls to SSE intrinsics. When using these example implementations with the ispc test suite, all but one or two tests pass with gcc and clang on Linux and OSX. There are currently ~10 failures with icc on Linux, and ~50 failures with MSVC 2010. (To be fixed in coming days.) Performance varies: when running the examples through the sse4.h target, some have the same performance as when compiled with --target=sse4 from ispc directly (options), while noise is 12% slower, rt is 26% slower, and aobench is 2.2x slower. The details of this haven't yet been carefully investigated, but will be in coming days as well. Issue #92.	2012-01-04 12:59:03 -08:00
Matt Pharr	1a81173c93	Fix examples/options Makefile to use -O3 for serial builds. Amazingly, it has been using just -g since the initial commit. :-(	2012-01-03 19:53:45 -08:00
Matt Pharr	20536bb339	Fix mandelbrot_tasks example	2011-12-11 15:21:11 -08:00
Matt Pharr	034507a35b	Update examples: bulk task launch in stencil/mandelbrot, use foreach more.	2011-12-10 11:11:30 -08:00
Matt Pharr	0b2febcec0	Update volume rendering workload: use AVX, remove reduce_equal() path. Both of these changes gave a performance benefit!	2011-12-09 17:40:50 -08:00
Matt Pharr	9805b0742d	Switch to avx-x2 for the stencil workload	2011-12-08 14:36:09 -08:00
Matt Pharr	f19c2aba40	Windows build fixes for examples, update options task granularity	2011-12-05 14:23:50 -08:00
Matt Pharr	ffc1d97df7	Fix aobench_instrumented build on Windows	2011-12-05 13:33:29 -08:00
Matt Pharr	9dd498718b	Updated options pricing example to have a tasking-based path as well.	2011-12-05 13:24:34 -08:00
Matt Pharr	c3b55de1ad	Fix volume rendering example for command-line args change	2011-12-03 09:30:10 -08:00
Matt Pharr	24ef9dac8f	Use foreach in the deferred shading example	2011-12-01 17:00:30 -08:00

... 2 3 4 5 6

258 Commits