Commit Graph

258 Commits

Author SHA1 Message Date
Ilia Filippov
806e37338c add script for measuring performance 2013-07-01 13:30:49 +04:00
Dmitry Babokin
ec1095624a Merge pull request #527 from tkoziara/master
examples/sort added
2013-06-25 10:11:39 -07:00
Tomasz Koziara
a23d69ebe8 Copyright changed to simplify legal matters. 2013-06-25 17:28:27 +01:00
Tomasz Koziara
86ee8db778 Parallel prefix sum added + minor amendements. 2013-06-25 12:45:51 +01:00
Ilia Filippov
9fb981e9a0 correction of --instrument option support 2013-06-25 12:33:23 +04:00
Tomasz Koziara
f2452f040d First commit of the radix sort example. 2013-06-24 18:37:44 +01:00
james.brodman
6211966c55 Change mask to use __mmask16 instead of a struct. 2013-05-30 16:04:44 -04:00
james.brodman
7b2eaf63af knc.h cleanup 2013-05-10 13:36:18 -04:00
Dmitry Babokin
1069a3c77e Removing some sources of warnings sse4.h and trailing spaces 2013-04-25 03:40:32 +04:00
james.brodman
52dcbf087a Implemented 3 more intrinsics on double precision vectors 2013-03-28 11:55:53 -04:00
james.brodman
ef1af547e2 Change sse4.h to enable inlining. 2013-03-13 10:55:53 -04:00
Jean-Luc Duprat
24087ff3cc Expose none() in the ISPC standard library.
On KNC: all(), any() and none() do not generate a redundant movmsk instruction.
2012-11-27 13:38:28 -08:00
Jean-Luc Duprat
2129b1e27d knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd()
This was a typo.
2012-11-21 15:40:35 -08:00
Jean-Luc Duprat
d3b86dcc90 KNC: fix implementation of __all() to use KNCni mask test instructions... 2012-11-14 09:24:01 -08:00
Jean-Luc Duprat
b601331362 Approximation for inverse sqrt and reciprocal provided in fast math mode.
RCP was actually slow in fast math mode
   Inverse sqrt did not expose fast approximation
2012-11-13 14:01:35 -08:00
james.brodman
97ddc1ed10 Fixed =/== error in __all() 2012-11-08 16:30:12 -05:00
jbrodman
e323b1d0ad Fixed compile error: == instead of = 2012-10-26 16:55:28 -04:00
Matt Pharr
406fbab40e Fix bugs in declarations of __any, __all, and __none in examples/intrinsics.
They return bool, not vector of bool.
2012-10-17 10:55:50 -07:00
Matt Pharr
9002837750 Remove incorrect assert in tasksys.cpp 2012-10-15 10:43:46 -07:00
Matt Pharr
538d51cbfe Add GMRES example 2012-09-20 14:06:55 -07:00
Jean-Luc Duprat
3dd9ff3d84 knc.h:
Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
	Fixed usage of loadunpack and packstore to use proper memory offset
	Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
	Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
	Minor cleanups

knc2x.h
	Fixed usage of loadunpack and packstore to use proper memory offset
	Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
	Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
	__any() and __none() speedups.
	Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
2012-09-19 17:11:04 -07:00
Ingo Wald
7f386923b0 Merge branch 'master' of https://github.com/ispc/ispc 2012-09-17 15:54:25 +02:00
Ingo Wald
d2312b1fbd now using the ASSUME_ALIGNED flag in knc.h 2012-09-17 15:54:00 +02:00
Ingo Wald
6655373ac3 commit test 2012-09-17 15:51:37 +02:00
Ingo Wald
d492af7bc0 64-bit gather/scatter, aligned load/store, i8 support 2012-09-17 03:39:02 +02:00
Jean-Luc Duprat
0e88d5f97f Fixed unaligned masked stores on KNC 2012-09-14 14:11:41 -07:00
Jean-Luc Duprat
f0b0618484 Added the following mask tests: __any(), __all(), __none() for all supported targets.
This allows for more efficient code generation of KNC.
2012-09-14 11:06:18 -07:00
Jean-Luc Duprat
11db466a88 Implement the KNC prefetch API so that ISPC prefetch_*() stdlib functions may be used. 2012-08-30 10:24:31 -07:00
Jean-Luc Duprat
09bb36f58c Updated the task system in the example directory to support:
Cilk (cilk_for), OpenMP (#pragma omp parallel for), TBB(tbb::task_group and tbb::parallel_for)
as well as a new pthreads-based model that fully subscribes the machine (good for KNC).
With major contributions from Ingo Wald and James Brodman.
2012-08-28 11:13:12 -07:00
Jean-Luc Duprat
8a22c63889 knc2x.h
Introduced knc2x.h which supprts 2x interleaved code generation for KNC (use the target generic-32).
This implementation is even more experimental and incomplete than knc.h but is useful already (mandelbrot works for example)

knc.h:
Switch to new intrinsic names _mm512_set_1to16_epi32() -> _mm512_set1_epi32(), etc...
Fix the declaration of the unspecialized template for __smear_*(), __setzero_*(), __undef_*()
Specifically mark _mm512_undefined_*() a few vectors in __load<>()
Fixed implementations of some implementations of __smear_*(), __setzero_*(), __undef_*() to remove unecessary dependent instructions.
Implemented ISPC reductions by simply calling existing intrinsic reductions, which are slightly more efficient than our precendent implementation.  Also added reductions for double types.
2012-08-15 17:41:10 -07:00
Jean-Luc Duprat
165a13b13e knc.h:
vec16_i64 improved with the addition of the following: __extract_element(), insert_element(), __sub(), __mul(),
		   __sdiv(), __udiv(), __and(), __or(), __xor(), __shl(), __lshr(), __ashr(), __select()
	Fixed a bug in the __mul(__vec16_i64, __vec16_i32) implementation
	Constructors are all explicitly inlined, copy constructor and operator=() explicitly provided
	Load and stores for __vec16_i64 and __vec16_d use aligned instructions when possible
	__rotate_i32() now has a vector implementation
	Added several reductions: __reduce_add_i32(), __reduce_min_i32(), __reduce_max_i32(),
	       __reduce_add_f(), __reduce_min_f(), __reduce_max_f()
2012-08-10 12:20:10 -07:00
Jean-Luc Duprat
a2d42c3242 KNC: all masked_load_*() and masked_store_*() functions need to do unaligned accesses 2012-08-01 14:37:25 -07:00
Jean-Luc Duprat
aecd6e0878 All the smear(), setzero() and undef() APIs are now templated on the return type.
Modified ISPC's internal mangling to pass these through unchanged.
Tried hard to make sure this is not going to introduce an ABI change.
2012-07-17 17:06:36 -07:00
Jean-Luc Duprat
e09e953bbb Added a few functions: __setzero_i64() __cast_sext(__vec16_i64, __vec16_i32), __cast_zext(__vec16_i32)
__min_varying_in32(), __min_varying_uint32(), __max_varying_int32(), __max_varying_uint32()
Fixed the signature of __smear_i64() to match current codegen
2012-07-12 10:32:38 -07:00
Jean-Luc Duprat
df18b2a150 Fixed missing tmp var needed for use with gather intrinsic 2012-07-11 15:43:11 -07:00
Matt Pharr
216ac4b1a4 Stop factoring out constant offsets for gather/scatter if instr is available.
For KNC (gather/scatter), it's not helpful to factor base+offsets gathers
and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets.
Now, if a HW instruction is available for gather/scatter, we just factor
into base + {1/2/4/8} * offsets (if possible).  Not only is this simpler,
but it's also what we need to pass a value along to the scale by
2/4/8 available directly in those instructions.

Finishes issue #325.
2012-07-11 14:52:29 -07:00
Matt Pharr
ec0280be11 Rename gather/scatter_base_offsets functions to *factored_based_offsets*.
No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
2012-07-11 14:16:39 -07:00
Jean-Luc Duprat
7a7c54bd59 Minor fixes to knc.h that resulted from integrating bea88ab122 2012-07-10 16:10:48 -07:00
Jean-Luc Duprat
bea88ab122 Integrated changes from mmp/and-fold-opt:
Add peephole optimization to eliminate some mask AND operations.

On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.

This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "and(equalfloat(a, b), c)" is changed to
"_equal_float_and_mask(a, b, c)", saving an instruction in the end.

Issue #319.

Merge commit '8ef6bc16364d4c08aa5972141748110160613087'

Conflicts:
	examples/intrinsics/knc.h
	examples/intrinsics/sse4.h
2012-07-10 10:33:24 -07:00
Matt Pharr
bc7775aef2 Fix __ordered and _unordered floating point functions for C++ target.
Fixes include adding "_float" and "_double" suffixes as appropriate as well
as providing a number of missing implementations.

This fixes a number of failures in the half* tests.
2012-07-09 14:35:51 -07:00
Matt Pharr
107669686c Fix naming of some comparison ops in knc.h 2012-07-09 12:43:15 -07:00
Jean-Luc Duprat
516ba85abd Merge pull request #322 from mmp/vector-constants
Vector constants
2012-07-09 09:28:26 -07:00
Jean-Luc Duprat
098277b4f0 Merge pull request #321 from mmp/setzero
More varied support for constant vectors from C++ backend.
2012-07-09 08:57:05 -07:00
Matt Pharr
8ef6bc1636 Add peephole optimization to eliminate some mask AND operations.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.

This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "__and(__equal_float(a, b), c)" is changed to
"__equal_float_and_mask(a, b, c)", saving an instruction in the end.

Issue #319.
2012-07-07 08:35:38 -07:00
Matt Pharr
974b40c8af Add type suffix to comparison ops in C++ output.
e.g. "__equal()" -> "__equal_float()", etc.

No functional change; this is necessary groundwork for a forthcoming
peephole optimization that eliminates ANDs of masks in some cases.
2012-07-07 07:50:59 -07:00
Matt Pharr
e5fe0eabdc Update __load() builtins to take const pointers. 2012-07-06 08:47:47 -07:00
Matt Pharr
0d3993fa25 More varied support for constant vectors from C++ backend.
If we have a vector of all zeros, a __setzero_* function call is emitted,
permitting calling specialized intrinsics for this.  Undefined values
are reflected with an __undef_* call, which similarly allows passing that
information along.

This change also includes a cleanup to the signature of the __smear_*
functions; since they already have different names depending on the
scalar value type, we don't need to use the trick of passing an
undefined value of the return vector type as the first parameter as
an indirect way to overload by return value.

Issue #317.
2012-07-05 20:19:11 -07:00
Jean-Luc Duprat
ac421f68e2 Ongoing support for int64 for KNC:
Fixes to __load and __store.
Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext,
and __scatter_base_offsets32_float.

__rcp_varying_float now has a fast-math and full-precision implementation.
2012-07-05 17:05:42 -07:00
Jean-Luc Duprat
95d8f76ec3 Added prelimary support for Intel's Xeon Phi KNC processor.
float, int32 and double support is included; int8, int16 and int64
not supported yet.

This is work in progress and not considered stable yet.
2012-06-28 12:00:55 -07:00
Jean-Luc Duprat
e431b07e04 Changed the C API to use templates to indicate memory alignment to the C compiler
This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)

Updated generic-32.h and generic-64.h to the new memory API
2012-06-28 09:29:15 -07:00