aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
jbrodman	fbb34b3f3a	Merge pull request #747 from dbabokin/knc_extract_element Knc.h fix	2014-02-27 10:51:17 -08:00
Dmitry Babokin	f0a7baf340	Remove conflicting __extract_element(__vec16_i64 ..., ...)	2014-02-22 01:10:55 +04:00
evghenii	8490efe0ad	fix for knc.h. Due to a bug in ICC (tested with 13.1.3 & 14.0.1) ,the resulting .cpp file fails to compile	2014-02-07 16:00:21 +01:00
evghenii	438cee4e21	added support for double precision/native transendentals/trigonometry	2014-02-07 15:43:42 +01:00
evghenii	c59cff396d	added {rsqrt,rcp}d support for knc.h. test-147.ispc & test-148.ispc pass.	2014-02-05 13:55:38 +01:00
Evghenii	eb01ffd4e6	first commit for {rsqrt,rcp}d knc support. going to test on other node now	2014-02-05 13:43:07 +01:00
james.brodman	9f933b500b	Add missing __cast_sext(__vec16_i32,__vec16_i1)	2013-12-20 16:45:27 -05:00
Ilia Filippov	15816eb07e	adding __packed_store_active2 to generic targets	2013-12-19 17:50:18 +04:00
Matt Pharr	e7f067d70c	Fix handling of __clock() builtin for "generic" targets.	2013-08-20 09:04:52 -07:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
james.brodman	6211966c55	Change mask to use __mmask16 instead of a struct.	2013-05-30 16:04:44 -04:00
james.brodman	7b2eaf63af	knc.h cleanup	2013-05-10 13:36:18 -04:00
james.brodman	52dcbf087a	Implemented 3 more intrinsics on double precision vectors	2013-03-28 11:55:53 -04:00
Jean-Luc Duprat	24087ff3cc	Expose none() in the ISPC standard library. On KNC: all(), any() and none() do not generate a redundant movmsk instruction.	2012-11-27 13:38:28 -08:00
Jean-Luc Duprat	2129b1e27d	knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd() This was a typo.	2012-11-21 15:40:35 -08:00
Jean-Luc Duprat	d3b86dcc90	KNC: fix implementation of __all() to use KNCni mask test instructions...	2012-11-14 09:24:01 -08:00
Jean-Luc Duprat	b601331362	Approximation for inverse sqrt and reciprocal provided in fast math mode. RCP was actually slow in fast math mode Inverse sqrt did not expose fast approximation	2012-11-13 14:01:35 -08:00
james.brodman	97ddc1ed10	Fixed =/== error in __all()	2012-11-08 16:30:12 -05:00
Matt Pharr	406fbab40e	Fix bugs in declarations of __any, __all, and __none in examples/intrinsics. They return bool, not vector of bool.	2012-10-17 10:55:50 -07:00
Jean-Luc Duprat	3dd9ff3d84	knc.h: Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor Minor cleanups knc2x.h Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used __any() and __none() speedups. Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor	2012-09-19 17:11:04 -07:00
Ingo Wald	7f386923b0	Merge branch 'master' of https://github.com/ispc/ispc	2012-09-17 15:54:25 +02:00
Ingo Wald	d2312b1fbd	now using the ASSUME_ALIGNED flag in knc.h	2012-09-17 15:54:00 +02:00
Ingo Wald	6655373ac3	commit test	2012-09-17 15:51:37 +02:00
Ingo Wald	d492af7bc0	64-bit gather/scatter, aligned load/store, i8 support	2012-09-17 03:39:02 +02:00
Jean-Luc Duprat	0e88d5f97f	Fixed unaligned masked stores on KNC	2012-09-14 14:11:41 -07:00
Jean-Luc Duprat	f0b0618484	Added the following mask tests: __any(), __all(), __none() for all supported targets. This allows for more efficient code generation of KNC.	2012-09-14 11:06:18 -07:00
Jean-Luc Duprat	11db466a88	Implement the KNC prefetch API so that ISPC prefetch_*() stdlib functions may be used.	2012-08-30 10:24:31 -07:00
Jean-Luc Duprat	8a22c63889	knc2x.h Introduced knc2x.h which supprts 2x interleaved code generation for KNC (use the target generic-32). This implementation is even more experimental and incomplete than knc.h but is useful already (mandelbrot works for example) knc.h: Switch to new intrinsic names _mm512_set_1to16_epi32() -> _mm512_set1_epi32(), etc... Fix the declaration of the unspecialized template for __smear_(), __setzero_(), __undef_() Specifically mark _mm512_undefined_() a few vectors in __load<>() Fixed implementations of some implementations of __smear_(), __setzero_(), __undef_*() to remove unecessary dependent instructions. Implemented ISPC reductions by simply calling existing intrinsic reductions, which are slightly more efficient than our precendent implementation. Also added reductions for double types.	2012-08-15 17:41:10 -07:00
Jean-Luc Duprat	165a13b13e	knc.h: vec16_i64 improved with the addition of the following: __extract_element(), insert_element(), __sub(), __mul(), __sdiv(), __udiv(), __and(), __or(), __xor(), __shl(), __lshr(), __ashr(), __select() Fixed a bug in the __mul(__vec16_i64, __vec16_i32) implementation Constructors are all explicitly inlined, copy constructor and operator=() explicitly provided Load and stores for __vec16_i64 and __vec16_d use aligned instructions when possible __rotate_i32() now has a vector implementation Added several reductions: __reduce_add_i32(), __reduce_min_i32(), __reduce_max_i32(), __reduce_add_f(), __reduce_min_f(), __reduce_max_f()	2012-08-10 12:20:10 -07:00
Jean-Luc Duprat	a2d42c3242	KNC: all masked_load_() and masked_store_() functions need to do unaligned accesses	2012-08-01 14:37:25 -07:00
Jean-Luc Duprat	aecd6e0878	All the smear(), setzero() and undef() APIs are now templated on the return type. Modified ISPC's internal mangling to pass these through unchanged. Tried hard to make sure this is not going to introduce an ABI change.	2012-07-17 17:06:36 -07:00
Jean-Luc Duprat	e09e953bbb	Added a few functions: __setzero_i64() __cast_sext(__vec16_i64, __vec16_i32), __cast_zext(__vec16_i32) __min_varying_in32(), __min_varying_uint32(), __max_varying_int32(), __max_varying_uint32() Fixed the signature of __smear_i64() to match current codegen	2012-07-12 10:32:38 -07:00
Jean-Luc Duprat	df18b2a150	Fixed missing tmp var needed for use with gather intrinsic	2012-07-11 15:43:11 -07:00
Matt Pharr	216ac4b1a4	Stop factoring out constant offsets for gather/scatter if instr is available. For KNC (gather/scatter), it's not helpful to factor base+offsets gathers and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets. Now, if a HW instruction is available for gather/scatter, we just factor into base + {1/2/4/8} * offsets (if possible). Not only is this simpler, but it's also what we need to pass a value along to the scale by 2/4/8 available directly in those instructions. Finishes issue #325.	2012-07-11 14:52:29 -07:00
Matt Pharr	ec0280be11	Rename gather/scatter_base_offsets functions to factored_based_offsets. No functional change; just preparation for having a path that doesn't factor the offsets into constant and varying parts, which will be better for AVX2 and KNC.	2012-07-11 14:16:39 -07:00
Jean-Luc Duprat	7a7c54bd59	Minor fixes to knc.h that resulted from integrating `bea88ab122`	2012-07-10 16:10:48 -07:00
Jean-Luc Duprat	bea88ab122	Integrated changes from mmp/and-fold-opt: Add peephole optimization to eliminate some mask AND operations. On KNC, the various vector comparison instructions can optionally be masked; if a mask is provided, the result is effectively that the value returned is the AND of the mask with the result of the comparison. This change adds an optimization pass to the C++ backend that looks for vector ANDs where one operand is a comparison and rewrites them--e.g. "and(equalfloat(a, b), c)" is changed to "_equal_float_and_mask(a, b, c)", saving an instruction in the end. Issue #319. Merge commit '8ef6bc16364d4c08aa5972141748110160613087' Conflicts: examples/intrinsics/knc.h examples/intrinsics/sse4.h	2012-07-10 10:33:24 -07:00
Matt Pharr	bc7775aef2	Fix __ordered and _unordered floating point functions for C++ target. Fixes include adding "_float" and "_double" suffixes as appropriate as well as providing a number of missing implementations. This fixes a number of failures in the half* tests.	2012-07-09 14:35:51 -07:00
Matt Pharr	107669686c	Fix naming of some comparison ops in knc.h	2012-07-09 12:43:15 -07:00
Jean-Luc Duprat	516ba85abd	Merge pull request #322 from mmp/vector-constants Vector constants	2012-07-09 09:28:26 -07:00
Jean-Luc Duprat	098277b4f0	Merge pull request #321 from mmp/setzero More varied support for constant vectors from C++ backend.	2012-07-09 08:57:05 -07:00
Matt Pharr	8ef6bc1636	Add peephole optimization to eliminate some mask AND operations. On KNC, the various vector comparison instructions can optionally be masked; if a mask is provided, the result is effectively that the value returned is the AND of the mask with the result of the comparison. This change adds an optimization pass to the C++ backend that looks for vector ANDs where one operand is a comparison and rewrites them--e.g. "__and(__equal_float(a, b), c)" is changed to "__equal_float_and_mask(a, b, c)", saving an instruction in the end. Issue #319.	2012-07-07 08:35:38 -07:00
Matt Pharr	974b40c8af	Add type suffix to comparison ops in C++ output. e.g. "__equal()" -> "__equal_float()", etc. No functional change; this is necessary groundwork for a forthcoming peephole optimization that eliminates ANDs of masks in some cases.	2012-07-07 07:50:59 -07:00
Matt Pharr	e5fe0eabdc	Update __load() builtins to take const pointers.	2012-07-06 08:47:47 -07:00
Matt Pharr	0d3993fa25	More varied support for constant vectors from C++ backend. If we have a vector of all zeros, a __setzero_* function call is emitted, permitting calling specialized intrinsics for this. Undefined values are reflected with an __undef_* call, which similarly allows passing that information along. This change also includes a cleanup to the signature of the __smear_* functions; since they already have different names depending on the scalar value type, we don't need to use the trick of passing an undefined value of the return vector type as the first parameter as an indirect way to overload by return value. Issue #317.	2012-07-05 20:19:11 -07:00
Jean-Luc Duprat	ac421f68e2	Ongoing support for int64 for KNC: Fixes to __load and __store. Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext, and __scatter_base_offsets32_float. __rcp_varying_float now has a fast-math and full-precision implementation.	2012-07-05 17:05:42 -07:00
Jean-Luc Duprat	95d8f76ec3	Added prelimary support for Intel's Xeon Phi KNC processor. float, int32 and double support is included; int8, int16 and int64 not supported yet. This is work in progress and not considered stable yet.	2012-06-28 12:00:55 -07:00

47 Commits