evghenii
ddecdeb834
move remaining int64 from knc.h some of fails to pass tests, grep for evghenii::fails to find out which functions fail and on what tests
2013-09-20 14:55:15 +03:00
evghenii
5cabf0bef0
adding int64 support form knc.h, phase 1. bugs: __lshr & __ashr fail idiv.ispc test, __equal_i64 & __equal_i64_and_mask fails reduce_equal_8.ispc test
2013-09-20 14:13:40 +03:00
evghenii
0ed89e93fa
added fails info
2013-09-19 16:34:06 +03:00
egaburov
d68dbbc7bc
Merge remote-tracking branch 'upstream/master' into knc-modes
2013-09-19 15:08:17 +02:00
evghenii
0c274212c2
performance tuning for knc-i1x8.h. this gives goed enough performance for double only. float performance is terrible
2013-09-19 16:07:22 +03:00
evghenii
dbef4fd7d7
fixed notation
2013-09-19 14:52:22 +03:00
evghenii
6a21218c13
fix warrning and add KNC 1
2013-09-19 13:45:31 +03:00
Ilia Filippov
00cd90c6b0
test system
2013-09-19 12:26:57 +04:00
evghenii
3cf63362a4
small tuning
2013-09-18 20:03:08 +03:00
evghenii
e4b1f58595
performance fix.. still some issues left with equal_i1 for __vec8_i1
2013-09-18 19:14:41 +03:00
evghenii
4b1a0b4bc4
added fails
2013-09-18 18:41:22 +03:00
evghenii
922edb1128
completed knc-i1x16.h and added knc-i1x8.h with knc-i1x8unsafe_fast.h that doesnt pass several tests..
2013-09-18 18:14:07 +03:00
Dmitry Babokin
191d9dede5
Merge pull request #585 from tkoziara/master
...
Sort description.
2013-09-16 10:08:49 -07:00
Tomasz Koziara
6e0b9ddc74
Sort description.
2013-09-16 18:02:07 +01:00
Dmitry Babokin
b258027061
Merge pull request #582 from tkoziara/master
...
Uniform memory allocation in sort example is fixed.
2013-09-16 03:29:43 -07:00
Tomasz Koziara
97068765e8
Copyright reversed.
2013-09-14 18:09:04 +01:00
Tomasz Koziara
ed825b3773
Uniform memory allocation fixed.
2013-09-13 13:14:31 +01:00
Ilia Filippov
f620cdbaa1
Changes in perf.py functionality, unification of examples, correction build warnings
2013-08-26 14:04:59 +04:00
Matt Pharr
2b2905b567
Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc.
...
This should be a bool, not a one-wide vector of bools. The equivalent
fix was previously made in generic-16.h, but not made here. (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
2013-08-20 09:05:50 -07:00
Matt Pharr
e7f067d70c
Fix handling of __clock() builtin for "generic" targets.
2013-08-20 09:04:52 -07:00
Matt Pharr
7ab4c5391c
Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.
2013-08-09 19:56:43 -07:00
Matt Pharr
cd9afe946c
Merge branch 'master' into arm
...
Conflicts:
Makefile
builtins.cpp
ispc.cpp
ispc.h
ispc.vcxproj
opt.cpp
2013-08-06 17:39:21 -07:00
Dmitry Babokin
43423c276f
Merge pull request #560 from ifilippov/perf
...
Supporting perf.py on Mac OS
2013-08-01 13:20:01 -07:00
Ilia Filippov
3c06924a02
Supporting perf.py on Mac OS
2013-08-01 12:47:37 +04:00
Dmitry Babokin
220f0b0b40
Renaming mandelbrot_tasks files to be different from mandelbrot
2013-07-30 19:53:12 -07:00
Dmitry Babokin
fa93cb7d0b
InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)
2013-07-29 22:46:36 -07:00
Matt Pharr
b6df447b55
Add reduce_add() for int8 and int16 types.
...
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
d7b0c5794e
Add support for ARM NEON targets.
...
Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests
pass, and all examples compile and run correctly. Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.
Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined. Known
issues/opportunities for improvement include:
- fp32 vector divide is done as a series of scalar divides rather than
a vector divide (which I believe exists, but I may be mistaken.)
This is particularly harmful to examples/rt, which only runs ~1.5x
faster with ispc, likely due to long chains of scalar divides.
- The compiler isn't generating a vmin.f32 for e.g. the final scalar
min in reduce_min(); instead it's generating a compare and then a
select instruction (and similarly elsewhere).
- There are some additional FIXMEs in builtins/target-neon.ll that
include both a few pieces of missing functionality (e.g. rounding
doubles) as well as places that deserve attention for possible
code quality improvements.
- Currently only the "cortex-a9" and "cortex-15" CPU targets are
supported; LLVM supports many other ARM CPUs and ispc should provide
access to all of the ones that have NEON support (and aren't too
obscure.)
- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
only when the compiler runs on an ARM host, though).
- The Windows build hasn't been tested (though I've tried to update
ispc.vcxproj appropriately). It may just work, but will more likely
have various small issues.)
- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Matt Pharr
b007bba59f
Replace inline assembly in task system with equivalent gcc intrinsics.
...
gcc/icc build only: the Windows build still uses the Win32 calls for
these.
2013-07-19 23:07:24 -07:00
Ilia Filippov
fd7f87b55e
Supporting perf.py on Windows and some small corrections in it
2013-07-02 19:23:18 +04:00
Dmitry Babokin
8be4128c5a
Merge pull request #534 from ifilippov/perf
...
add script for measuring performance
2013-07-01 05:09:03 -07:00
Ilia Filippov
806e37338c
add script for measuring performance
2013-07-01 13:30:49 +04:00
Dmitry Babokin
ec1095624a
Merge pull request #527 from tkoziara/master
...
examples/sort added
2013-06-25 10:11:39 -07:00
Tomasz Koziara
a23d69ebe8
Copyright changed to simplify legal matters.
2013-06-25 17:28:27 +01:00
Tomasz Koziara
86ee8db778
Parallel prefix sum added + minor amendements.
2013-06-25 12:45:51 +01:00
Ilia Filippov
9fb981e9a0
correction of --instrument option support
2013-06-25 12:33:23 +04:00
Tomasz Koziara
f2452f040d
First commit of the radix sort example.
2013-06-24 18:37:44 +01:00
james.brodman
6211966c55
Change mask to use __mmask16 instead of a struct.
2013-05-30 16:04:44 -04:00
james.brodman
7b2eaf63af
knc.h cleanup
2013-05-10 13:36:18 -04:00
Dmitry Babokin
1069a3c77e
Removing some sources of warnings sse4.h and trailing spaces
2013-04-25 03:40:32 +04:00
james.brodman
52dcbf087a
Implemented 3 more intrinsics on double precision vectors
2013-03-28 11:55:53 -04:00
james.brodman
ef1af547e2
Change sse4.h to enable inlining.
2013-03-13 10:55:53 -04:00
Jean-Luc Duprat
24087ff3cc
Expose none() in the ISPC standard library.
...
On KNC: all(), any() and none() do not generate a redundant movmsk instruction.
2012-11-27 13:38:28 -08:00
Jean-Luc Duprat
2129b1e27d
knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd()
...
This was a typo.
2012-11-21 15:40:35 -08:00
Jean-Luc Duprat
d3b86dcc90
KNC: fix implementation of __all() to use KNCni mask test instructions...
2012-11-14 09:24:01 -08:00
Jean-Luc Duprat
b601331362
Approximation for inverse sqrt and reciprocal provided in fast math mode.
...
RCP was actually slow in fast math mode
Inverse sqrt did not expose fast approximation
2012-11-13 14:01:35 -08:00
james.brodman
97ddc1ed10
Fixed =/== error in __all()
2012-11-08 16:30:12 -05:00
jbrodman
e323b1d0ad
Fixed compile error: == instead of =
2012-10-26 16:55:28 -04:00
Matt Pharr
406fbab40e
Fix bugs in declarations of __any, __all, and __none in examples/intrinsics.
...
They return bool, not vector of bool.
2012-10-17 10:55:50 -07:00
Matt Pharr
9002837750
Remove incorrect assert in tasksys.cpp
2012-10-15 10:43:46 -07:00