Commit Graph

181 Commits

Author SHA1 Message Date
Dmitry Babokin
23cb59427d Merge pull request #607 from ifilippov/testing
correction of test system
2013-09-26 04:02:49 -07:00
Ilia Filippov
1c858c34f7 correction of test system 2013-09-26 14:54:15 +04:00
Ilia Filippov
87cecddabb adding sort to performance checking 2013-09-20 18:57:20 +04:00
Ilia Filippov
00cd90c6b0 test system 2013-09-19 12:26:57 +04:00
Dmitry Babokin
191d9dede5 Merge pull request #585 from tkoziara/master
Sort description.
2013-09-16 10:08:49 -07:00
Tomasz Koziara
6e0b9ddc74 Sort description. 2013-09-16 18:02:07 +01:00
Dmitry Babokin
b258027061 Merge pull request #582 from tkoziara/master
Uniform memory allocation in sort example is fixed.
2013-09-16 03:29:43 -07:00
Tomasz Koziara
97068765e8 Copyright reversed. 2013-09-14 18:09:04 +01:00
Tomasz Koziara
ed825b3773 Uniform memory allocation fixed. 2013-09-13 13:14:31 +01:00
Ilia Filippov
f620cdbaa1 Changes in perf.py functionality, unification of examples, correction build warnings 2013-08-26 14:04:59 +04:00
Matt Pharr
2b2905b567 Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc.
This should be a bool, not a one-wide vector of bools.  The equivalent
fix was previously made in generic-16.h, but not made here.  (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
2013-08-20 09:05:50 -07:00
Matt Pharr
e7f067d70c Fix handling of __clock() builtin for "generic" targets. 2013-08-20 09:04:52 -07:00
Matt Pharr
7ab4c5391c Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target. 2013-08-09 19:56:43 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Dmitry Babokin
43423c276f Merge pull request #560 from ifilippov/perf
Supporting perf.py on Mac OS
2013-08-01 13:20:01 -07:00
Ilia Filippov
3c06924a02 Supporting perf.py on Mac OS 2013-08-01 12:47:37 +04:00
Dmitry Babokin
220f0b0b40 Renaming mandelbrot_tasks files to be different from mandelbrot 2013-07-30 19:53:12 -07:00
Dmitry Babokin
fa93cb7d0b InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported) 2013-07-29 22:46:36 -07:00
Matt Pharr
b6df447b55 Add reduce_add() for int8 and int16 types.
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Matt Pharr
b007bba59f Replace inline assembly in task system with equivalent gcc intrinsics.
gcc/icc build only: the Windows build still uses the Win32 calls for
these.
2013-07-19 23:07:24 -07:00
Ilia Filippov
fd7f87b55e Supporting perf.py on Windows and some small corrections in it 2013-07-02 19:23:18 +04:00
Dmitry Babokin
8be4128c5a Merge pull request #534 from ifilippov/perf
add script for measuring performance
2013-07-01 05:09:03 -07:00
Ilia Filippov
806e37338c add script for measuring performance 2013-07-01 13:30:49 +04:00
Dmitry Babokin
ec1095624a Merge pull request #527 from tkoziara/master
examples/sort added
2013-06-25 10:11:39 -07:00
Tomasz Koziara
a23d69ebe8 Copyright changed to simplify legal matters. 2013-06-25 17:28:27 +01:00
Tomasz Koziara
86ee8db778 Parallel prefix sum added + minor amendements. 2013-06-25 12:45:51 +01:00
Ilia Filippov
9fb981e9a0 correction of --instrument option support 2013-06-25 12:33:23 +04:00
Tomasz Koziara
f2452f040d First commit of the radix sort example. 2013-06-24 18:37:44 +01:00
james.brodman
6211966c55 Change mask to use __mmask16 instead of a struct. 2013-05-30 16:04:44 -04:00
james.brodman
7b2eaf63af knc.h cleanup 2013-05-10 13:36:18 -04:00
Dmitry Babokin
1069a3c77e Removing some sources of warnings sse4.h and trailing spaces 2013-04-25 03:40:32 +04:00
james.brodman
52dcbf087a Implemented 3 more intrinsics on double precision vectors 2013-03-28 11:55:53 -04:00
james.brodman
ef1af547e2 Change sse4.h to enable inlining. 2013-03-13 10:55:53 -04:00
Jean-Luc Duprat
24087ff3cc Expose none() in the ISPC standard library.
On KNC: all(), any() and none() do not generate a redundant movmsk instruction.
2012-11-27 13:38:28 -08:00
Jean-Luc Duprat
2129b1e27d knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd()
This was a typo.
2012-11-21 15:40:35 -08:00
Jean-Luc Duprat
d3b86dcc90 KNC: fix implementation of __all() to use KNCni mask test instructions... 2012-11-14 09:24:01 -08:00
Jean-Luc Duprat
b601331362 Approximation for inverse sqrt and reciprocal provided in fast math mode.
RCP was actually slow in fast math mode
   Inverse sqrt did not expose fast approximation
2012-11-13 14:01:35 -08:00
james.brodman
97ddc1ed10 Fixed =/== error in __all() 2012-11-08 16:30:12 -05:00
jbrodman
e323b1d0ad Fixed compile error: == instead of = 2012-10-26 16:55:28 -04:00
Matt Pharr
406fbab40e Fix bugs in declarations of __any, __all, and __none in examples/intrinsics.
They return bool, not vector of bool.
2012-10-17 10:55:50 -07:00
Matt Pharr
9002837750 Remove incorrect assert in tasksys.cpp 2012-10-15 10:43:46 -07:00
Matt Pharr
538d51cbfe Add GMRES example 2012-09-20 14:06:55 -07:00
Jean-Luc Duprat
3dd9ff3d84 knc.h:
Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
	Fixed usage of loadunpack and packstore to use proper memory offset
	Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
	Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
	Minor cleanups

knc2x.h
	Fixed usage of loadunpack and packstore to use proper memory offset
	Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
	Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
	__any() and __none() speedups.
	Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
2012-09-19 17:11:04 -07:00
Ingo Wald
7f386923b0 Merge branch 'master' of https://github.com/ispc/ispc 2012-09-17 15:54:25 +02:00
Ingo Wald
d2312b1fbd now using the ASSUME_ALIGNED flag in knc.h 2012-09-17 15:54:00 +02:00
Ingo Wald
6655373ac3 commit test 2012-09-17 15:51:37 +02:00
Ingo Wald
d492af7bc0 64-bit gather/scatter, aligned load/store, i8 support 2012-09-17 03:39:02 +02:00
Jean-Luc Duprat
0e88d5f97f Fixed unaligned masked stores on KNC 2012-09-14 14:11:41 -07:00
Jean-Luc Duprat
f0b0618484 Added the following mask tests: __any(), __all(), __none() for all supported targets.
This allows for more efficient code generation of KNC.
2012-09-14 11:06:18 -07:00