aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
evghenii	fb1a2a0a40	__masked_store_* uses vscatter now, and is thread-safe	2013-10-15 17:10:46 +03:00
evghenii	3da152a150	fixed zmm __mul for i64 with icc < 14.0.0, 4 knc::fails lefts, but I doubt these are due to this include..	2013-10-07 18:30:22 +03:00
evghenii	4222605f87	fixed lshr/ashr/shl shifts. __mul i64 vector version for icc < 14.0.0 works only on signed, so commented it out in favour of sequential	2013-10-07 14:24:27 +03:00
evghenii	1b196520f6	knc-i1x16.h is cleaned: int32,float,double are complete, int64 is partially complete	2013-10-05 22:10:05 +03:00
evghenii	10223cfac3	workong on shuffle/rotate for double, there seems to be a bug in cvt2zmm cvt2hilo	2013-10-05 15:23:55 +03:00
evghenii	8b0fc558cb	complete cleaning	2013-10-05 14:15:33 +03:00
evghenii	8a6789ef61	cleaned float added fails info	2013-10-04 14:11:09 +03:00
evghenii	57f019a6e0	cleaned int64 added fails info	2013-10-04 13:39:15 +03:00
evghenii	32c77be2f3	cleaned mask & int32, only test141 fails	2013-10-04 11:42:52 +03:00
james.brodman	dc8895352a	Adding missing typecasts and guarding i64 __mul with compiler version check	2013-10-01 11:53:56 -04:00
evghenii	019043f55e	patched half2float & float2half to pass the tests. Now only test-141 is failed. but it seems to be test rather than knc-i1x16.h related	2013-09-23 09:55:55 +03:00
evghenii	ddecdeb834	move remaining int64 from knc.h some of fails to pass tests, grep for evghenii::fails to find out which functions fail and on what tests	2013-09-20 14:55:15 +03:00
evghenii	5cabf0bef0	adding int64 support form knc.h, phase 1. bugs: __lshr & __ashr fail idiv.ispc test, __equal_i64 & __equal_i64_and_mask fails reduce_equal_8.ispc test	2013-09-20 14:13:40 +03:00
evghenii	0ed89e93fa	added fails info	2013-09-19 16:34:06 +03:00
evghenii	0c274212c2	performance tuning for knc-i1x8.h. this gives goed enough performance for double only. float performance is terrible	2013-09-19 16:07:22 +03:00
evghenii	dbef4fd7d7	fixed notation	2013-09-19 14:52:22 +03:00
evghenii	6a21218c13	fix warrning and add KNC 1	2013-09-19 13:45:31 +03:00
evghenii	3cf63362a4	small tuning	2013-09-18 20:03:08 +03:00
evghenii	e4b1f58595	performance fix.. still some issues left with equal_i1 for __vec8_i1	2013-09-18 19:14:41 +03:00
evghenii	4b1a0b4bc4	added fails	2013-09-18 18:41:22 +03:00
evghenii	922edb1128	completed knc-i1x16.h and added knc-i1x8.h with knc-i1x8unsafe_fast.h that doesnt pass several tests..	2013-09-18 18:14:07 +03:00
Matt Pharr	2b2905b567	Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc. This should be a bool, not a one-wide vector of bools. The equivalent fix was previously made in generic-16.h, but not made here. (Note that many tests are still failing with these targets, but at least they compile properly now.)	2013-08-20 09:05:50 -07:00
Matt Pharr	e7f067d70c	Fix handling of __clock() builtin for "generic" targets.	2013-08-20 09:04:52 -07:00
Matt Pharr	7ab4c5391c	Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.	2013-08-09 19:56:43 -07:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
james.brodman	6211966c55	Change mask to use __mmask16 instead of a struct.	2013-05-30 16:04:44 -04:00
james.brodman	7b2eaf63af	knc.h cleanup	2013-05-10 13:36:18 -04:00
Dmitry Babokin	1069a3c77e	Removing some sources of warnings sse4.h and trailing spaces	2013-04-25 03:40:32 +04:00
james.brodman	52dcbf087a	Implemented 3 more intrinsics on double precision vectors	2013-03-28 11:55:53 -04:00
james.brodman	ef1af547e2	Change sse4.h to enable inlining.	2013-03-13 10:55:53 -04:00
Jean-Luc Duprat	24087ff3cc	Expose none() in the ISPC standard library. On KNC: all(), any() and none() do not generate a redundant movmsk instruction.	2012-11-27 13:38:28 -08:00
Jean-Luc Duprat	2129b1e27d	knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd() This was a typo.	2012-11-21 15:40:35 -08:00
Jean-Luc Duprat	d3b86dcc90	KNC: fix implementation of __all() to use KNCni mask test instructions...	2012-11-14 09:24:01 -08:00
Jean-Luc Duprat	b601331362	Approximation for inverse sqrt and reciprocal provided in fast math mode. RCP was actually slow in fast math mode Inverse sqrt did not expose fast approximation	2012-11-13 14:01:35 -08:00
james.brodman	97ddc1ed10	Fixed =/== error in __all()	2012-11-08 16:30:12 -05:00
jbrodman	e323b1d0ad	Fixed compile error: == instead of =	2012-10-26 16:55:28 -04:00
Matt Pharr	406fbab40e	Fix bugs in declarations of __any, __all, and __none in examples/intrinsics. They return bool, not vector of bool.	2012-10-17 10:55:50 -07:00
Jean-Luc Duprat	3dd9ff3d84	knc.h: Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor Minor cleanups knc2x.h Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used __any() and __none() speedups. Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor	2012-09-19 17:11:04 -07:00
Ingo Wald	7f386923b0	Merge branch 'master' of https://github.com/ispc/ispc	2012-09-17 15:54:25 +02:00
Ingo Wald	d2312b1fbd	now using the ASSUME_ALIGNED flag in knc.h	2012-09-17 15:54:00 +02:00
Ingo Wald	6655373ac3	commit test	2012-09-17 15:51:37 +02:00
Ingo Wald	d492af7bc0	64-bit gather/scatter, aligned load/store, i8 support	2012-09-17 03:39:02 +02:00
Jean-Luc Duprat	0e88d5f97f	Fixed unaligned masked stores on KNC	2012-09-14 14:11:41 -07:00
Jean-Luc Duprat	f0b0618484	Added the following mask tests: __any(), __all(), __none() for all supported targets. This allows for more efficient code generation of KNC.	2012-09-14 11:06:18 -07:00
Jean-Luc Duprat	11db466a88	Implement the KNC prefetch API so that ISPC prefetch_*() stdlib functions may be used.	2012-08-30 10:24:31 -07:00
Jean-Luc Duprat	8a22c63889	knc2x.h Introduced knc2x.h which supprts 2x interleaved code generation for KNC (use the target generic-32). This implementation is even more experimental and incomplete than knc.h but is useful already (mandelbrot works for example) knc.h: Switch to new intrinsic names _mm512_set_1to16_epi32() -> _mm512_set1_epi32(), etc... Fix the declaration of the unspecialized template for __smear_(), __setzero_(), __undef_() Specifically mark _mm512_undefined_() a few vectors in __load<>() Fixed implementations of some implementations of __smear_(), __setzero_(), __undef_*() to remove unecessary dependent instructions. Implemented ISPC reductions by simply calling existing intrinsic reductions, which are slightly more efficient than our precendent implementation. Also added reductions for double types.	2012-08-15 17:41:10 -07:00
Jean-Luc Duprat	165a13b13e	knc.h: vec16_i64 improved with the addition of the following: __extract_element(), insert_element(), __sub(), __mul(), __sdiv(), __udiv(), __and(), __or(), __xor(), __shl(), __lshr(), __ashr(), __select() Fixed a bug in the __mul(__vec16_i64, __vec16_i32) implementation Constructors are all explicitly inlined, copy constructor and operator=() explicitly provided Load and stores for __vec16_i64 and __vec16_d use aligned instructions when possible __rotate_i32() now has a vector implementation Added several reductions: __reduce_add_i32(), __reduce_min_i32(), __reduce_max_i32(), __reduce_add_f(), __reduce_min_f(), __reduce_max_f()	2012-08-10 12:20:10 -07:00
Jean-Luc Duprat	a2d42c3242	KNC: all masked_load_() and masked_store_() functions need to do unaligned accesses	2012-08-01 14:37:25 -07:00
Jean-Luc Duprat	aecd6e0878	All the smear(), setzero() and undef() APIs are now templated on the return type. Modified ISPC's internal mangling to pass these through unchanged. Tried hard to make sure this is not going to introduce an ABI change.	2012-07-17 17:06:36 -07:00
Jean-Luc Duprat	e09e953bbb	Added a few functions: __setzero_i64() __cast_sext(__vec16_i64, __vec16_i32), __cast_zext(__vec16_i32) __min_varying_in32(), __min_varying_uint32(), __max_varying_int32(), __max_varying_uint32() Fixed the signature of __smear_i64() to match current codegen	2012-07-12 10:32:38 -07:00

1 2

87 Commits