aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	974b40c8af	Add type suffix to comparison ops in C++ output. e.g. "__equal()" -> "__equal_float()", etc. No functional change; this is necessary groundwork for a forthcoming peephole optimization that eliminates ANDs of masks in some cases.	2012-07-07 07:50:59 -07:00
Jean-Luc Duprat	e431b07e04	Changed the C API to use templates to indicate memory alignment to the C compiler This should help with performance of the generated code. Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h) Updated generic-32.h and generic-64.h to the new memory API	2012-06-28 09:29:15 -07:00
Matt Pharr	89a2566e01	Add separate variants of memory built-ins for floats and doubles. Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have separate float/double variants of each of those.	2012-06-07 14:47:16 -07:00
Matt Pharr	1ac3e03171	Gather/scatter function improvements in builtins. More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.	2012-06-07 14:19:23 -07:00
Matt Pharr	b86d40091a	Improve naming of masked load/store instructions in builtins. Now, use _i32 suffixes, rather than _32, etc. Also cleaned up the m4 macro to generate these functions, using WIDTH to get the target width, etc.	2012-06-07 13:58:31 -07:00
Matt Pharr	90db01d038	Represent MOVMSK'ed masks with int64s rather than int32s. This allows us to scale up to 64-wide execution.	2012-05-25 11:57:23 -07:00
Matt Pharr	f4df2fb176	Improvements to mask update code for generic targets. Rather than XOR'ing with a temporary 'all-on' vector, we call __not. Also, we call out to __and_not1 and __and_not2, for an AND where the first or second operand, respectively, has had NOT applied to it.	2012-05-16 13:52:51 -07:00
Matt Pharr	c6241581a0	Add an extra parameter to __smear functions to encode return type. Now, the __smear* functions in generated C++ code have an unused first parameter of the desired return type; this allows us to have headers that include variants of __smear for multiple target widths. (This approach is necessary since we can't overload by return type in C++.) Issue #256.	2012-05-08 09:54:23 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
Matt Pharr	12c754c92b	Improved handling of splatted constant vectors in C++ backend. Now, when we're printing out a constant vector value, we check to see if it's a splat and call out to one of the __splat_* functions in the generated code if to.	2012-04-19 13:11:15 -07:00
Matt Pharr	c96fef6bc8	Fix silly error in generic-16.h example C++ bindings.	2012-01-27 17:04:57 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	68f6ea8def	For << and >> with C++, detect when all instances are shifting by the same amount. In this case, we now emit calls to potentially-specialized functions for the left/right shifts that take a single integer value for the shift amount. These in turn can be matched to the corresponding intrinsics for the SSE target. Issue #145.	2012-01-19 10:04:32 -07:00
Matt Pharr	d14a2de168	Fix generic code emission when building with LLVM3.0/2.9. Specifically, don't use vector select for masked store blend there, but emit a call to a undefined __masked_store_blend_*() functions. Added implementations of these functions to the sse4.h and generic-16.h in examples/instrinsics. (Calls to these will never be generated with LLVM 3.1).	2012-01-17 23:42:22 -07:00
Matt Pharr	c6d1cebad4	Update masked_load/store implementations for generic targets to take void *s (Fixes compile errors when we try to actually use these!)	2012-01-17 23:42:22 -07:00
Matt Pharr	78c6d3c02f	Add initial support for 'goto' statements. ispc now supports goto, but only under uniform control flow--i.e. it must be possible for the compiler to statically determine that all program instances will follow the goto. An error is issued at compile time if a goto is used when this is not the case.	2012-01-05 12:22:36 -08:00
Matt Pharr	8938e14442	Add support for emitting ~generic vectorized C++ code. The compiler now supports an --emit-c++ option, which generates generic vector C++ code. To actually compile this code, the user must provide C++ code that implements a variety of types and operations (e.g. adding two floating-point vector values together, comparing them, etc). There are two examples of this required code in examples/intrinsics: generic-16.h is a "generic" 16-wide implementation that does all required with scalar math; it's useful for demonstrating the requirements of the implementation. Then, sse4.h shows a simple implementation of a SSE4 target that maps the emitted function calls to SSE intrinsics. When using these example implementations with the ispc test suite, all but one or two tests pass with gcc and clang on Linux and OSX. There are currently ~10 failures with icc on Linux, and ~50 failures with MSVC 2010. (To be fixed in coming days.) Performance varies: when running the examples through the sse4.h target, some have the same performance as when compiled with --target=sse4 from ispc directly (options), while noise is 12% slower, rt is 26% slower, and aobench is 2.2x slower. The details of this haven't yet been carefully investigated, but will be in coming days as well. Issue #92.	2012-01-04 12:59:03 -08:00

17 Commits