evghenii
fb1a2a0a40
__masked_store_* uses vscatter now, and is thread-safe
2013-10-15 17:10:46 +03:00
Dmitry Babokin
99df2d9dbf
Switch examples on Unix from using g++ to clang++
2013-10-11 16:29:17 +04:00
evghenii
3da152a150
fixed zmm __mul for i64 with icc < 14.0.0, 4 knc::fails lefts, but I doubt these are due to this include..
2013-10-07 18:30:22 +03:00
evghenii
4222605f87
fixed lshr/ashr/shl shifts. __mul i64 vector version for icc < 14.0.0 works only on signed, so commented it out in favour of sequential
2013-10-07 14:24:27 +03:00
evghenii
1b196520f6
knc-i1x16.h is cleaned: int32,float,double are complete, int64 is partially complete
2013-10-05 22:10:05 +03:00
evghenii
10223cfac3
workong on shuffle/rotate for double, there seems to be a bug in cvt2zmm cvt2hilo
2013-10-05 15:23:55 +03:00
evghenii
8b0fc558cb
complete cleaning
2013-10-05 14:15:33 +03:00
evghenii
8a6789ef61
cleaned float added fails info
2013-10-04 14:11:09 +03:00
evghenii
57f019a6e0
cleaned int64 added fails info
2013-10-04 13:39:15 +03:00
evghenii
32c77be2f3
cleaned mask & int32, only test141 fails
2013-10-04 11:42:52 +03:00
Dmitry Babokin
2741e3c1d0
Merge pull request #616 from jbrodman/master
...
Adding missing typecasts and guarding i64 __mul with icc version check
2013-10-01 08:59:52 -07:00
james.brodman
dc8895352a
Adding missing typecasts and guarding i64 __mul with compiler version check
2013-10-01 11:53:56 -04:00
Dmitry Babokin
c7b4164122
Redefining ISPC should not discard ISPC_FLAGS
2013-10-01 18:40:26 +04:00
Dmitry Babokin
2d6f7a7c93
Support i686 architecture recognition as x86 and enable 32 bit x86 platforms
2013-10-01 17:37:34 +04:00
jbrodman
39c2274f1a
Merge pull request #588 from egaburov/knc-modes
...
Added knc-i1x16.h , knc-i1x8.h and knc-i1x8unsafe_fast.h
2013-09-27 11:20:56 -07:00
Dmitry Babokin
23cb59427d
Merge pull request #607 from ifilippov/testing
...
correction of test system
2013-09-26 04:02:49 -07:00
Ilia Filippov
1c858c34f7
correction of test system
2013-09-26 14:54:15 +04:00
evghenii
019043f55e
patched half2float & float2half to pass the tests. Now only test-141 is failed. but it seems to be test rather than knc-i1x16.h related
2013-09-23 09:55:55 +03:00
Ilia Filippov
87cecddabb
adding sort to performance checking
2013-09-20 18:57:20 +04:00
evghenii
ddecdeb834
move remaining int64 from knc.h some of fails to pass tests, grep for evghenii::fails to find out which functions fail and on what tests
2013-09-20 14:55:15 +03:00
evghenii
5cabf0bef0
adding int64 support form knc.h, phase 1. bugs: __lshr & __ashr fail idiv.ispc test, __equal_i64 & __equal_i64_and_mask fails reduce_equal_8.ispc test
2013-09-20 14:13:40 +03:00
evghenii
0ed89e93fa
added fails info
2013-09-19 16:34:06 +03:00
egaburov
d68dbbc7bc
Merge remote-tracking branch 'upstream/master' into knc-modes
2013-09-19 15:08:17 +02:00
evghenii
0c274212c2
performance tuning for knc-i1x8.h. this gives goed enough performance for double only. float performance is terrible
2013-09-19 16:07:22 +03:00
evghenii
dbef4fd7d7
fixed notation
2013-09-19 14:52:22 +03:00
evghenii
6a21218c13
fix warrning and add KNC 1
2013-09-19 13:45:31 +03:00
Ilia Filippov
00cd90c6b0
test system
2013-09-19 12:26:57 +04:00
evghenii
3cf63362a4
small tuning
2013-09-18 20:03:08 +03:00
evghenii
e4b1f58595
performance fix.. still some issues left with equal_i1 for __vec8_i1
2013-09-18 19:14:41 +03:00
evghenii
4b1a0b4bc4
added fails
2013-09-18 18:41:22 +03:00
evghenii
922edb1128
completed knc-i1x16.h and added knc-i1x8.h with knc-i1x8unsafe_fast.h that doesnt pass several tests..
2013-09-18 18:14:07 +03:00
Dmitry Babokin
191d9dede5
Merge pull request #585 from tkoziara/master
...
Sort description.
2013-09-16 10:08:49 -07:00
Tomasz Koziara
6e0b9ddc74
Sort description.
2013-09-16 18:02:07 +01:00
Dmitry Babokin
b258027061
Merge pull request #582 from tkoziara/master
...
Uniform memory allocation in sort example is fixed.
2013-09-16 03:29:43 -07:00
Tomasz Koziara
97068765e8
Copyright reversed.
2013-09-14 18:09:04 +01:00
Tomasz Koziara
ed825b3773
Uniform memory allocation fixed.
2013-09-13 13:14:31 +01:00
Ilia Filippov
f620cdbaa1
Changes in perf.py functionality, unification of examples, correction build warnings
2013-08-26 14:04:59 +04:00
Matt Pharr
2b2905b567
Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc.
...
This should be a bool, not a one-wide vector of bools. The equivalent
fix was previously made in generic-16.h, but not made here. (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
2013-08-20 09:05:50 -07:00
Matt Pharr
e7f067d70c
Fix handling of __clock() builtin for "generic" targets.
2013-08-20 09:04:52 -07:00
Matt Pharr
7ab4c5391c
Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.
2013-08-09 19:56:43 -07:00
Matt Pharr
cd9afe946c
Merge branch 'master' into arm
...
Conflicts:
Makefile
builtins.cpp
ispc.cpp
ispc.h
ispc.vcxproj
opt.cpp
2013-08-06 17:39:21 -07:00
Dmitry Babokin
43423c276f
Merge pull request #560 from ifilippov/perf
...
Supporting perf.py on Mac OS
2013-08-01 13:20:01 -07:00
Ilia Filippov
3c06924a02
Supporting perf.py on Mac OS
2013-08-01 12:47:37 +04:00
Dmitry Babokin
220f0b0b40
Renaming mandelbrot_tasks files to be different from mandelbrot
2013-07-30 19:53:12 -07:00
Dmitry Babokin
fa93cb7d0b
InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)
2013-07-29 22:46:36 -07:00
Matt Pharr
b6df447b55
Add reduce_add() for int8 and int16 types.
...
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
d7b0c5794e
Add support for ARM NEON targets.
...
Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests
pass, and all examples compile and run correctly. Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.
Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined. Known
issues/opportunities for improvement include:
- fp32 vector divide is done as a series of scalar divides rather than
a vector divide (which I believe exists, but I may be mistaken.)
This is particularly harmful to examples/rt, which only runs ~1.5x
faster with ispc, likely due to long chains of scalar divides.
- The compiler isn't generating a vmin.f32 for e.g. the final scalar
min in reduce_min(); instead it's generating a compare and then a
select instruction (and similarly elsewhere).
- There are some additional FIXMEs in builtins/target-neon.ll that
include both a few pieces of missing functionality (e.g. rounding
doubles) as well as places that deserve attention for possible
code quality improvements.
- Currently only the "cortex-a9" and "cortex-15" CPU targets are
supported; LLVM supports many other ARM CPUs and ispc should provide
access to all of the ones that have NEON support (and aren't too
obscure.)
- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
only when the compiler runs on an ARM host, though).
- The Windows build hasn't been tested (though I've tried to update
ispc.vcxproj appropriately). It may just work, but will more likely
have various small issues.)
- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Matt Pharr
b007bba59f
Replace inline assembly in task system with equivalent gcc intrinsics.
...
gcc/icc build only: the Windows build still uses the Win32 calls for
these.
2013-07-19 23:07:24 -07:00
Ilia Filippov
fd7f87b55e
Supporting perf.py on Windows and some small corrections in it
2013-07-02 19:23:18 +04:00
Dmitry Babokin
8be4128c5a
Merge pull request #534 from ifilippov/perf
...
add script for measuring performance
2013-07-01 05:09:03 -07:00