aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Jean-Luc Duprat	bea88ab122	Integrated changes from mmp/and-fold-opt: Add peephole optimization to eliminate some mask AND operations. On KNC, the various vector comparison instructions can optionally be masked; if a mask is provided, the result is effectively that the value returned is the AND of the mask with the result of the comparison. This change adds an optimization pass to the C++ backend that looks for vector ANDs where one operand is a comparison and rewrites them--e.g. "and(equalfloat(a, b), c)" is changed to "_equal_float_and_mask(a, b, c)", saving an instruction in the end. Issue #319. Merge commit '8ef6bc16364d4c08aa5972141748110160613087' Conflicts: examples/intrinsics/knc.h examples/intrinsics/sse4.h	2012-07-10 10:33:24 -07:00
Matt Pharr	bc7775aef2	Fix __ordered and _unordered floating point functions for C++ target. Fixes include adding "_float" and "_double" suffixes as appropriate as well as providing a number of missing implementations. This fixes a number of failures in the half* tests.	2012-07-09 14:35:51 -07:00
Matt Pharr	107669686c	Fix naming of some comparison ops in knc.h	2012-07-09 12:43:15 -07:00
Jean-Luc Duprat	516ba85abd	Merge pull request #322 from mmp/vector-constants Vector constants	2012-07-09 09:28:26 -07:00
Jean-Luc Duprat	098277b4f0	Merge pull request #321 from mmp/setzero More varied support for constant vectors from C++ backend.	2012-07-09 08:57:05 -07:00
Matt Pharr	8ef6bc1636	Add peephole optimization to eliminate some mask AND operations. On KNC, the various vector comparison instructions can optionally be masked; if a mask is provided, the result is effectively that the value returned is the AND of the mask with the result of the comparison. This change adds an optimization pass to the C++ backend that looks for vector ANDs where one operand is a comparison and rewrites them--e.g. "__and(__equal_float(a, b), c)" is changed to "__equal_float_and_mask(a, b, c)", saving an instruction in the end. Issue #319.	2012-07-07 08:35:38 -07:00
Matt Pharr	974b40c8af	Add type suffix to comparison ops in C++ output. e.g. "__equal()" -> "__equal_float()", etc. No functional change; this is necessary groundwork for a forthcoming peephole optimization that eliminates ANDs of masks in some cases.	2012-07-07 07:50:59 -07:00
Matt Pharr	e5fe0eabdc	Update __load() builtins to take const pointers.	2012-07-06 08:47:47 -07:00
Matt Pharr	0d3993fa25	More varied support for constant vectors from C++ backend. If we have a vector of all zeros, a __setzero_* function call is emitted, permitting calling specialized intrinsics for this. Undefined values are reflected with an __undef_* call, which similarly allows passing that information along. This change also includes a cleanup to the signature of the __smear_* functions; since they already have different names depending on the scalar value type, we don't need to use the trick of passing an undefined value of the return vector type as the first parameter as an indirect way to overload by return value. Issue #317.	2012-07-05 20:19:11 -07:00
Jean-Luc Duprat	ac421f68e2	Ongoing support for int64 for KNC: Fixes to __load and __store. Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext, and __scatter_base_offsets32_float. __rcp_varying_float now has a fast-math and full-precision implementation.	2012-07-05 17:05:42 -07:00
Jean-Luc Duprat	95d8f76ec3	Added prelimary support for Intel's Xeon Phi KNC processor. float, int32 and double support is included; int8, int16 and int64 not supported yet. This is work in progress and not considered stable yet.	2012-06-28 12:00:55 -07:00
Jean-Luc Duprat	e431b07e04	Changed the C API to use templates to indicate memory alignment to the C compiler This should help with performance of the generated code. Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h) Updated generic-32.h and generic-64.h to the new memory API	2012-06-28 09:29:15 -07:00
Matt Pharr	54459255d4	Add unmasked { } statement. This reestablishes an "all on" execution mask for the gang, which can be useful for nested parallelism..	2012-06-22 14:30:58 -07:00
Matt Pharr	27e39954d6	Fix a number of issues in examples/intrinsics/sse4.h. This had gotten fairly out of date, after recent changes to C++ output. Roughly 15 tests still fail with this target. Issue #278.	2012-06-08 12:52:36 -07:00
Matt Pharr	89a2566e01	Add separate variants of memory built-ins for floats and doubles. Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have separate float/double variants of each of those.	2012-06-07 14:47:16 -07:00
Matt Pharr	1ac3e03171	Gather/scatter function improvements in builtins. More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.	2012-06-07 14:19:23 -07:00
Matt Pharr	b86d40091a	Improve naming of masked load/store instructions in builtins. Now, use _i32 suffixes, rather than _32, etc. Also cleaned up the m4 macro to generate these functions, using WIDTH to get the target width, etc.	2012-06-07 13:58:31 -07:00
Matt Pharr	8fd9b84a80	Update seed_rng() in stdlib to take a varying seed. Previously, we were trying to take a uniform seed and then shuffle that around to initialize the state for each of the program instances. This was becoming increasingly untenable and brittle. Now a varying seed is expected and used.	2012-05-30 10:35:41 -07:00
Matt Pharr	5084712a15	Fix bugs in examples/intrinsics/generic-64.h There were a number of situations where we were left-shifting 1 by a lane index that were failing due to shifting beyond 32-bits. Fixed by shifting the 64-bit constant value 1ull.	2012-05-29 08:31:10 -07:00
Matt Pharr	21c43737fe	Fix bug in examples/intrinsics/generic-32.h	2012-05-25 14:27:30 -07:00
Matt Pharr	6c7bcf00e7	Add examples/intrinsics/generic-64.h.	2012-05-25 14:27:19 -07:00
Matt Pharr	7a2142075c	Add examples/intrinsics/generic-32.h implementation. Roughly 100 tests fail with this; all the tests need to be audited for assumptions that 16 is the widest width possible…	2012-05-25 12:37:59 -07:00
Matt Pharr	90db01d038	Represent MOVMSK'ed masks with int64s rather than int32s. This allows us to scale up to 64-wide execution.	2012-05-25 11:57:23 -07:00
Matt Pharr	fd03ba7586	Export reference parameters as C++ references, not pointers.	2012-05-24 07:12:48 -07:00
Matt Pharr	f4df2fb176	Improvements to mask update code for generic targets. Rather than XOR'ing with a temporary 'all-on' vector, we call __not. Also, we call out to __and_not1 and __and_not2, for an AND where the first or second operand, respectively, has had NOT applied to it.	2012-05-16 13:52:51 -07:00
Matt Pharr	c6241581a0	Add an extra parameter to __smear functions to encode return type. Now, the __smear* functions in generated C++ code have an unused first parameter of the desired return type; this allows us to have headers that include variants of __smear for multiple target widths. (This approach is necessary since we can't overload by return type in C++.) Issue #256.	2012-05-08 09:54:23 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
John Poole	cd98a29a4b	Fix 32-bit samples on Mac OS X. On Mac OS X and Linux rdtsc() didn't save and restore 32-bit registers. This patch fixes issue #87.	2012-04-23 16:00:07 -07:00
Matt Pharr	12c754c92b	Improved handling of splatted constant vectors in C++ backend. Now, when we're printing out a constant vector value, we check to see if it's a splat and call out to one of the __splat_* functions in the generated code if to.	2012-04-19 13:11:15 -07:00
Matt Pharr	10c5ba140c	Much more efficient half_to_float() code, via @rygorous. Also, switch deferred shading example to use it. (Rather than the "fast" half to float that doesn't handle deforms, etc.)	2012-03-21 16:13:04 -07:00
Matt Pharr	ddfe4932ac	Fix parsing of 'launch' so that angle brackets can be removed. Issue #6.	2012-03-19 11:27:32 -07:00
Matt Pharr	640918bcc0	Call fclose() in deferred example. (Andy Zhang).	2012-03-07 08:50:10 -08:00
Matt Pharr	0115eeabfe	Update deferred example to take advantage of new pointer variability rules.	2012-02-29 14:27:53 -08:00
Matt Pharr	f81acbfe80	Implement unbound varibility for struct types. Now, if a struct member has an explicit 'uniform' or 'varying' qualifier, then that member has that variability, regardless of the variability of the struct's variability. Members without 'uniform' or 'varying' have unbound variability, and in turn inherit the variability of the struct. As a result of this, now structs can properly be 'varying' by default, just like all the other types, while still having sensible semantics.	2012-02-21 10:28:31 -08:00
Matt Pharr	56ec939692	Add perfbench to examples.sln for Windows	2012-02-14 10:07:08 -08:00
Matt Pharr	fe2d9aa600	Add perfbench to examples: a few small microbenchmarks.	2012-02-10 12:27:13 -08:00
Matt Pharr	83c8650b36	Add support for "local" atomics. Also updated aobench example to use them, which in turn allows using foreach() and thence a much cleaner implementation. Issue #58.	2012-02-03 13:15:21 -08:00
Matt Pharr	ea027a95a8	Fix various places in deferred shading example that assumed programCount >= 4. This gets deferred closer to working with the scalar target, but there are still some issues. (Partially in gamma correction / final clamping, it seems.) This fix causes a ~0.5% performance degradation with e.g. the AVX target, though it's not clear that it's worth having a separate code path in order to not lose this small amount of perf. (Partially addresses issue #167)	2012-01-31 11:46:33 -08:00
Matt Pharr	950f86200b	Fix examples/tasksys.cpp to compile with 32-bit targets. (Change a cmpxchgd to cmpxchl.) Note that a number of the examples still don't work with 32-bit compilation, why still TBD.	2012-01-30 15:03:54 -08:00
Matt Pharr	0575b1f38d	Update run_tests and examples makefile for scalar target. Fixed a number of tests that didn't handle the programCount == 1 case correctly.	2012-01-29 16:22:25 -08:00
Matt Pharr	c96fef6bc8	Fix silly error in generic-16.h example C++ bindings.	2012-01-27 17:04:57 -08:00
Matt Pharr	bba02f87ea	Improve implementations of unsigned <=, >= in sse4 intrinsics file.	2012-01-27 16:49:41 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	68f6ea8def	For << and >> with C++, detect when all instances are shifting by the same amount. In this case, we now emit calls to potentially-specialized functions for the left/right shifts that take a single integer value for the shift amount. These in turn can be matched to the corresponding intrinsics for the SSE target. Issue #145.	2012-01-19 10:04:32 -07:00
Matt Pharr	d14a2de168	Fix generic code emission when building with LLVM3.0/2.9. Specifically, don't use vector select for masked store blend there, but emit a call to a undefined __masked_store_blend_*() functions. Added implementations of these functions to the sse4.h and generic-16.h in examples/instrinsics. (Calls to these will never be generated with LLVM 3.1).	2012-01-17 23:42:22 -07:00
Matt Pharr	3bf3ac7922	Be more conservative about using blending in place of masked store. More specifically, we do a proper masked store (rather than a load- blend-store) unless we can determine that we're accessing a stack-allocated "varying" variable. This fixes a number of nefarious bugs where given code like: uniform float a[21]; foreach (i = 0 … 21) a[i] = 0; We'd use a blend and in turn read past the end of a[] in the last iteration. Also made slight changes to inlining in aobench; this keeps compiles to ~5s, versus ~45s without them (with this change). Fixes issue #160.	2012-01-17 23:42:22 -07:00
Matt Pharr	c6d1cebad4	Update masked_load/store implementations for generic targets to take void *s (Fixes compile errors when we try to actually use these!)	2012-01-17 23:42:22 -07:00
Matt Pharr	08189ce08c	Update "inline" qualifiers in a few examples.	2012-01-17 23:42:22 -07:00
Matt Pharr	5b4dbc8167	Fix build of aobench_instrumented example on OSX/Linux	2012-01-08 10:02:43 -08:00
Matt Pharr	78c6d3c02f	Add initial support for 'goto' statements. ispc now supports goto, but only under uniform control flow--i.e. it must be possible for the compiler to statically determine that all program instances will follow the goto. An error is issued at compile time if a goto is used when this is not the case.	2012-01-05 12:22:36 -08:00

... 6 7 8 9 10

470 Commits