aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
james.brodman	ef1af547e2	Change sse4.h to enable inlining.	2013-03-13 10:55:53 -04:00
Jean-Luc Duprat	24087ff3cc	Expose none() in the ISPC standard library. On KNC: all(), any() and none() do not generate a redundant movmsk instruction.	2012-11-27 13:38:28 -08:00
Jean-Luc Duprat	2129b1e27d	knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd() This was a typo.	2012-11-21 15:40:35 -08:00
Jean-Luc Duprat	d3b86dcc90	KNC: fix implementation of __all() to use KNCni mask test instructions...	2012-11-14 09:24:01 -08:00
Jean-Luc Duprat	b601331362	Approximation for inverse sqrt and reciprocal provided in fast math mode. RCP was actually slow in fast math mode Inverse sqrt did not expose fast approximation	2012-11-13 14:01:35 -08:00
james.brodman	97ddc1ed10	Fixed =/== error in __all()	2012-11-08 16:30:12 -05:00
jbrodman	e323b1d0ad	Fixed compile error: == instead of =	2012-10-26 16:55:28 -04:00
Matt Pharr	406fbab40e	Fix bugs in declarations of __any, __all, and __none in examples/intrinsics. They return bool, not vector of bool.	2012-10-17 10:55:50 -07:00
Jean-Luc Duprat	3dd9ff3d84	knc.h: Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor Minor cleanups knc2x.h Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used __any() and __none() speedups. Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor	2012-09-19 17:11:04 -07:00
Ingo Wald	7f386923b0	Merge branch 'master' of https://github.com/ispc/ispc	2012-09-17 15:54:25 +02:00
Ingo Wald	d2312b1fbd	now using the ASSUME_ALIGNED flag in knc.h	2012-09-17 15:54:00 +02:00
Ingo Wald	6655373ac3	commit test	2012-09-17 15:51:37 +02:00
Ingo Wald	d492af7bc0	64-bit gather/scatter, aligned load/store, i8 support	2012-09-17 03:39:02 +02:00
Jean-Luc Duprat	0e88d5f97f	Fixed unaligned masked stores on KNC	2012-09-14 14:11:41 -07:00
Jean-Luc Duprat	f0b0618484	Added the following mask tests: __any(), __all(), __none() for all supported targets. This allows for more efficient code generation of KNC.	2012-09-14 11:06:18 -07:00
Jean-Luc Duprat	11db466a88	Implement the KNC prefetch API so that ISPC prefetch_*() stdlib functions may be used.	2012-08-30 10:24:31 -07:00
Jean-Luc Duprat	8a22c63889	knc2x.h Introduced knc2x.h which supprts 2x interleaved code generation for KNC (use the target generic-32). This implementation is even more experimental and incomplete than knc.h but is useful already (mandelbrot works for example) knc.h: Switch to new intrinsic names _mm512_set_1to16_epi32() -> _mm512_set1_epi32(), etc... Fix the declaration of the unspecialized template for __smear_(), __setzero_(), __undef_() Specifically mark _mm512_undefined_() a few vectors in __load<>() Fixed implementations of some implementations of __smear_(), __setzero_(), __undef_*() to remove unecessary dependent instructions. Implemented ISPC reductions by simply calling existing intrinsic reductions, which are slightly more efficient than our precendent implementation. Also added reductions for double types.	2012-08-15 17:41:10 -07:00
Jean-Luc Duprat	165a13b13e	knc.h: vec16_i64 improved with the addition of the following: __extract_element(), insert_element(), __sub(), __mul(), __sdiv(), __udiv(), __and(), __or(), __xor(), __shl(), __lshr(), __ashr(), __select() Fixed a bug in the __mul(__vec16_i64, __vec16_i32) implementation Constructors are all explicitly inlined, copy constructor and operator=() explicitly provided Load and stores for __vec16_i64 and __vec16_d use aligned instructions when possible __rotate_i32() now has a vector implementation Added several reductions: __reduce_add_i32(), __reduce_min_i32(), __reduce_max_i32(), __reduce_add_f(), __reduce_min_f(), __reduce_max_f()	2012-08-10 12:20:10 -07:00
Jean-Luc Duprat	a2d42c3242	KNC: all masked_load_() and masked_store_() functions need to do unaligned accesses	2012-08-01 14:37:25 -07:00
Jean-Luc Duprat	aecd6e0878	All the smear(), setzero() and undef() APIs are now templated on the return type. Modified ISPC's internal mangling to pass these through unchanged. Tried hard to make sure this is not going to introduce an ABI change.	2012-07-17 17:06:36 -07:00
Jean-Luc Duprat	e09e953bbb	Added a few functions: __setzero_i64() __cast_sext(__vec16_i64, __vec16_i32), __cast_zext(__vec16_i32) __min_varying_in32(), __min_varying_uint32(), __max_varying_int32(), __max_varying_uint32() Fixed the signature of __smear_i64() to match current codegen	2012-07-12 10:32:38 -07:00
Jean-Luc Duprat	df18b2a150	Fixed missing tmp var needed for use with gather intrinsic	2012-07-11 15:43:11 -07:00
Matt Pharr	216ac4b1a4	Stop factoring out constant offsets for gather/scatter if instr is available. For KNC (gather/scatter), it's not helpful to factor base+offsets gathers and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets. Now, if a HW instruction is available for gather/scatter, we just factor into base + {1/2/4/8} * offsets (if possible). Not only is this simpler, but it's also what we need to pass a value along to the scale by 2/4/8 available directly in those instructions. Finishes issue #325.	2012-07-11 14:52:29 -07:00
Matt Pharr	ec0280be11	Rename gather/scatter_base_offsets functions to factored_based_offsets. No functional change; just preparation for having a path that doesn't factor the offsets into constant and varying parts, which will be better for AVX2 and KNC.	2012-07-11 14:16:39 -07:00
Jean-Luc Duprat	7a7c54bd59	Minor fixes to knc.h that resulted from integrating `bea88ab122`	2012-07-10 16:10:48 -07:00
Jean-Luc Duprat	bea88ab122	Integrated changes from mmp/and-fold-opt: Add peephole optimization to eliminate some mask AND operations. On KNC, the various vector comparison instructions can optionally be masked; if a mask is provided, the result is effectively that the value returned is the AND of the mask with the result of the comparison. This change adds an optimization pass to the C++ backend that looks for vector ANDs where one operand is a comparison and rewrites them--e.g. "and(equalfloat(a, b), c)" is changed to "_equal_float_and_mask(a, b, c)", saving an instruction in the end. Issue #319. Merge commit '8ef6bc16364d4c08aa5972141748110160613087' Conflicts: examples/intrinsics/knc.h examples/intrinsics/sse4.h	2012-07-10 10:33:24 -07:00
Matt Pharr	bc7775aef2	Fix __ordered and _unordered floating point functions for C++ target. Fixes include adding "_float" and "_double" suffixes as appropriate as well as providing a number of missing implementations. This fixes a number of failures in the half* tests.	2012-07-09 14:35:51 -07:00
Matt Pharr	107669686c	Fix naming of some comparison ops in knc.h	2012-07-09 12:43:15 -07:00
Jean-Luc Duprat	516ba85abd	Merge pull request #322 from mmp/vector-constants Vector constants	2012-07-09 09:28:26 -07:00
Jean-Luc Duprat	098277b4f0	Merge pull request #321 from mmp/setzero More varied support for constant vectors from C++ backend.	2012-07-09 08:57:05 -07:00
Matt Pharr	8ef6bc1636	Add peephole optimization to eliminate some mask AND operations. On KNC, the various vector comparison instructions can optionally be masked; if a mask is provided, the result is effectively that the value returned is the AND of the mask with the result of the comparison. This change adds an optimization pass to the C++ backend that looks for vector ANDs where one operand is a comparison and rewrites them--e.g. "__and(__equal_float(a, b), c)" is changed to "__equal_float_and_mask(a, b, c)", saving an instruction in the end. Issue #319.	2012-07-07 08:35:38 -07:00
Matt Pharr	974b40c8af	Add type suffix to comparison ops in C++ output. e.g. "__equal()" -> "__equal_float()", etc. No functional change; this is necessary groundwork for a forthcoming peephole optimization that eliminates ANDs of masks in some cases.	2012-07-07 07:50:59 -07:00
Matt Pharr	e5fe0eabdc	Update __load() builtins to take const pointers.	2012-07-06 08:47:47 -07:00
Matt Pharr	0d3993fa25	More varied support for constant vectors from C++ backend. If we have a vector of all zeros, a __setzero_* function call is emitted, permitting calling specialized intrinsics for this. Undefined values are reflected with an __undef_* call, which similarly allows passing that information along. This change also includes a cleanup to the signature of the __smear_* functions; since they already have different names depending on the scalar value type, we don't need to use the trick of passing an undefined value of the return vector type as the first parameter as an indirect way to overload by return value. Issue #317.	2012-07-05 20:19:11 -07:00
Jean-Luc Duprat	ac421f68e2	Ongoing support for int64 for KNC: Fixes to __load and __store. Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext, and __scatter_base_offsets32_float. __rcp_varying_float now has a fast-math and full-precision implementation.	2012-07-05 17:05:42 -07:00
Jean-Luc Duprat	95d8f76ec3	Added prelimary support for Intel's Xeon Phi KNC processor. float, int32 and double support is included; int8, int16 and int64 not supported yet. This is work in progress and not considered stable yet.	2012-06-28 12:00:55 -07:00
Jean-Luc Duprat	e431b07e04	Changed the C API to use templates to indicate memory alignment to the C compiler This should help with performance of the generated code. Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h) Updated generic-32.h and generic-64.h to the new memory API	2012-06-28 09:29:15 -07:00
Matt Pharr	27e39954d6	Fix a number of issues in examples/intrinsics/sse4.h. This had gotten fairly out of date, after recent changes to C++ output. Roughly 15 tests still fail with this target. Issue #278.	2012-06-08 12:52:36 -07:00
Matt Pharr	89a2566e01	Add separate variants of memory built-ins for floats and doubles. Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have separate float/double variants of each of those.	2012-06-07 14:47:16 -07:00
Matt Pharr	1ac3e03171	Gather/scatter function improvements in builtins. More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.	2012-06-07 14:19:23 -07:00
Matt Pharr	b86d40091a	Improve naming of masked load/store instructions in builtins. Now, use _i32 suffixes, rather than _32, etc. Also cleaned up the m4 macro to generate these functions, using WIDTH to get the target width, etc.	2012-06-07 13:58:31 -07:00
Matt Pharr	5084712a15	Fix bugs in examples/intrinsics/generic-64.h There were a number of situations where we were left-shifting 1 by a lane index that were failing due to shifting beyond 32-bits. Fixed by shifting the 64-bit constant value 1ull.	2012-05-29 08:31:10 -07:00
Matt Pharr	21c43737fe	Fix bug in examples/intrinsics/generic-32.h	2012-05-25 14:27:30 -07:00
Matt Pharr	6c7bcf00e7	Add examples/intrinsics/generic-64.h.	2012-05-25 14:27:19 -07:00
Matt Pharr	7a2142075c	Add examples/intrinsics/generic-32.h implementation. Roughly 100 tests fail with this; all the tests need to be audited for assumptions that 16 is the widest width possible…	2012-05-25 12:37:59 -07:00
Matt Pharr	90db01d038	Represent MOVMSK'ed masks with int64s rather than int32s. This allows us to scale up to 64-wide execution.	2012-05-25 11:57:23 -07:00
Matt Pharr	f4df2fb176	Improvements to mask update code for generic targets. Rather than XOR'ing with a temporary 'all-on' vector, we call __not. Also, we call out to __and_not1 and __and_not2, for an AND where the first or second operand, respectively, has had NOT applied to it.	2012-05-16 13:52:51 -07:00
Matt Pharr	c6241581a0	Add an extra parameter to __smear functions to encode return type. Now, the __smear* functions in generated C++ code have an unused first parameter of the desired return type; this allows us to have headers that include variants of __smear for multiple target widths. (This approach is necessary since we can't overload by return type in C++.) Issue #256.	2012-05-08 09:54:23 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
Matt Pharr	12c754c92b	Improved handling of splatted constant vectors in C++ backend. Now, when we're printing out a constant vector value, we check to see if it's a splat and call out to one of the __splat_* functions in the generated code if to.	2012-04-19 13:11:15 -07:00

1 2 3

108 Commits