james.brodman
9f933b500b
Add missing __cast_sext(__vec16_i32,__vec16_i1)
2013-12-20 16:45:27 -05:00
Ilia Filippov
15816eb07e
adding __packed_store_active2 to generic targets
2013-12-19 17:50:18 +04:00
evghenii
32cfdd52d3
Merge branch 'master' into knc-fix
2013-11-05 15:46:54 +01:00
evghenii
015af03bdc
changed back to #define ISPC_FORCE_ALIGNED_MEMORY aligned_ld/st #else unaligned ld/st #endif. However load<64>/store<64> will still be unaliged w/o this define because of fails related to the issue #632
2013-11-05 15:41:14 +01:00
Dmitry Babokin
6585a925be
Merge pull request #641 from jbrodman/stdlibshift
...
Add a "shift" operator to the stdlib.
2013-10-28 14:18:31 -07:00
james.brodman
e682b19eda
Remove zero initialization for __vec4_i32
2013-10-28 17:13:07 -04:00
james.brodman
02681d531e
Minor tweak for interface.
2013-10-28 12:56:43 -04:00
james.brodman
641d882ea6
Add shift support for knc targets. This is not optimized.
2013-10-28 12:43:42 -04:00
james.brodman
1e80b3b0d7
Add shift support for generic-16 target.
2013-10-28 12:20:32 -04:00
james.brodman
d2b89e0e37
Tweak generic target.
2013-10-23 18:01:01 -04:00
james.brodman
c4ad8f6ed4
Add docs/generic impls
2013-10-23 15:51:59 -04:00
evghenii
fb1a2a0a40
__masked_store_* uses vscatter now, and is thread-safe
2013-10-15 17:10:46 +03:00
evghenii
3da152a150
fixed zmm __mul for i64 with icc < 14.0.0, 4 knc::fails lefts, but I doubt these are due to this include..
2013-10-07 18:30:22 +03:00
evghenii
4222605f87
fixed lshr/ashr/shl shifts. __mul i64 vector version for icc < 14.0.0 works only on signed, so commented it out in favour of sequential
2013-10-07 14:24:27 +03:00
evghenii
1b196520f6
knc-i1x16.h is cleaned: int32,float,double are complete, int64 is partially complete
2013-10-05 22:10:05 +03:00
evghenii
10223cfac3
workong on shuffle/rotate for double, there seems to be a bug in cvt2zmm cvt2hilo
2013-10-05 15:23:55 +03:00
evghenii
8b0fc558cb
complete cleaning
2013-10-05 14:15:33 +03:00
evghenii
8a6789ef61
cleaned float added fails info
2013-10-04 14:11:09 +03:00
evghenii
57f019a6e0
cleaned int64 added fails info
2013-10-04 13:39:15 +03:00
evghenii
32c77be2f3
cleaned mask & int32, only test141 fails
2013-10-04 11:42:52 +03:00
james.brodman
dc8895352a
Adding missing typecasts and guarding i64 __mul with compiler version check
2013-10-01 11:53:56 -04:00
evghenii
019043f55e
patched half2float & float2half to pass the tests. Now only test-141 is failed. but it seems to be test rather than knc-i1x16.h related
2013-09-23 09:55:55 +03:00
evghenii
ddecdeb834
move remaining int64 from knc.h some of fails to pass tests, grep for evghenii::fails to find out which functions fail and on what tests
2013-09-20 14:55:15 +03:00
evghenii
5cabf0bef0
adding int64 support form knc.h, phase 1. bugs: __lshr & __ashr fail idiv.ispc test, __equal_i64 & __equal_i64_and_mask fails reduce_equal_8.ispc test
2013-09-20 14:13:40 +03:00
evghenii
0ed89e93fa
added fails info
2013-09-19 16:34:06 +03:00
evghenii
0c274212c2
performance tuning for knc-i1x8.h. this gives goed enough performance for double only. float performance is terrible
2013-09-19 16:07:22 +03:00
evghenii
dbef4fd7d7
fixed notation
2013-09-19 14:52:22 +03:00
evghenii
6a21218c13
fix warrning and add KNC 1
2013-09-19 13:45:31 +03:00
evghenii
3cf63362a4
small tuning
2013-09-18 20:03:08 +03:00
evghenii
e4b1f58595
performance fix.. still some issues left with equal_i1 for __vec8_i1
2013-09-18 19:14:41 +03:00
evghenii
4b1a0b4bc4
added fails
2013-09-18 18:41:22 +03:00
evghenii
922edb1128
completed knc-i1x16.h and added knc-i1x8.h with knc-i1x8unsafe_fast.h that doesnt pass several tests..
2013-09-18 18:14:07 +03:00
Matt Pharr
2b2905b567
Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc.
...
This should be a bool, not a one-wide vector of bools. The equivalent
fix was previously made in generic-16.h, but not made here. (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
2013-08-20 09:05:50 -07:00
Matt Pharr
e7f067d70c
Fix handling of __clock() builtin for "generic" targets.
2013-08-20 09:04:52 -07:00
Matt Pharr
7ab4c5391c
Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.
2013-08-09 19:56:43 -07:00
Matt Pharr
b6df447b55
Add reduce_add() for int8 and int16 types.
...
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
james.brodman
6211966c55
Change mask to use __mmask16 instead of a struct.
2013-05-30 16:04:44 -04:00
james.brodman
7b2eaf63af
knc.h cleanup
2013-05-10 13:36:18 -04:00
Dmitry Babokin
1069a3c77e
Removing some sources of warnings sse4.h and trailing spaces
2013-04-25 03:40:32 +04:00
james.brodman
52dcbf087a
Implemented 3 more intrinsics on double precision vectors
2013-03-28 11:55:53 -04:00
james.brodman
ef1af547e2
Change sse4.h to enable inlining.
2013-03-13 10:55:53 -04:00
Jean-Luc Duprat
24087ff3cc
Expose none() in the ISPC standard library.
...
On KNC: all(), any() and none() do not generate a redundant movmsk instruction.
2012-11-27 13:38:28 -08:00
Jean-Luc Duprat
2129b1e27d
knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd()
...
This was a typo.
2012-11-21 15:40:35 -08:00
Jean-Luc Duprat
d3b86dcc90
KNC: fix implementation of __all() to use KNCni mask test instructions...
2012-11-14 09:24:01 -08:00
Jean-Luc Duprat
b601331362
Approximation for inverse sqrt and reciprocal provided in fast math mode.
...
RCP was actually slow in fast math mode
Inverse sqrt did not expose fast approximation
2012-11-13 14:01:35 -08:00
james.brodman
97ddc1ed10
Fixed =/== error in __all()
2012-11-08 16:30:12 -05:00
jbrodman
e323b1d0ad
Fixed compile error: == instead of =
2012-10-26 16:55:28 -04:00
Matt Pharr
406fbab40e
Fix bugs in declarations of __any, __all, and __none in examples/intrinsics.
...
They return bool, not vector of bool.
2012-10-17 10:55:50 -07:00
Jean-Luc Duprat
3dd9ff3d84
knc.h:
...
Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
Fixed usage of loadunpack and packstore to use proper memory offset
Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
Minor cleanups
knc2x.h
Fixed usage of loadunpack and packstore to use proper memory offset
Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
__any() and __none() speedups.
Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
2012-09-19 17:11:04 -07:00
Ingo Wald
7f386923b0
Merge branch 'master' of https://github.com/ispc/ispc
2012-09-17 15:54:25 +02:00