aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Dmitry Babokin	23cb59427d	Merge pull request #607 from ifilippov/testing correction of test system	2013-09-26 04:02:49 -07:00
Ilia Filippov	1c858c34f7	correction of test system	2013-09-26 14:54:15 +04:00
Ilia Filippov	87cecddabb	adding sort to performance checking	2013-09-20 18:57:20 +04:00
Ilia Filippov	00cd90c6b0	test system	2013-09-19 12:26:57 +04:00
Dmitry Babokin	191d9dede5	Merge pull request #585 from tkoziara/master Sort description.	2013-09-16 10:08:49 -07:00
Tomasz Koziara	6e0b9ddc74	Sort description.	2013-09-16 18:02:07 +01:00
Dmitry Babokin	b258027061	Merge pull request #582 from tkoziara/master Uniform memory allocation in sort example is fixed.	2013-09-16 03:29:43 -07:00
Tomasz Koziara	97068765e8	Copyright reversed.	2013-09-14 18:09:04 +01:00
Tomasz Koziara	ed825b3773	Uniform memory allocation fixed.	2013-09-13 13:14:31 +01:00
Ilia Filippov	f620cdbaa1	Changes in perf.py functionality, unification of examples, correction build warnings	2013-08-26 14:04:59 +04:00
Matt Pharr	2b2905b567	Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc. This should be a bool, not a one-wide vector of bools. The equivalent fix was previously made in generic-16.h, but not made here. (Note that many tests are still failing with these targets, but at least they compile properly now.)	2013-08-20 09:05:50 -07:00
Matt Pharr	e7f067d70c	Fix handling of __clock() builtin for "generic" targets.	2013-08-20 09:04:52 -07:00
Matt Pharr	7ab4c5391c	Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.	2013-08-09 19:56:43 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Dmitry Babokin	43423c276f	Merge pull request #560 from ifilippov/perf Supporting perf.py on Mac OS	2013-08-01 13:20:01 -07:00
Ilia Filippov	3c06924a02	Supporting perf.py on Mac OS	2013-08-01 12:47:37 +04:00
Dmitry Babokin	220f0b0b40	Renaming mandelbrot_tasks files to be different from mandelbrot	2013-07-30 19:53:12 -07:00
Dmitry Babokin	fa93cb7d0b	InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)	2013-07-29 22:46:36 -07:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Matt Pharr	b007bba59f	Replace inline assembly in task system with equivalent gcc intrinsics. gcc/icc build only: the Windows build still uses the Win32 calls for these.	2013-07-19 23:07:24 -07:00
Ilia Filippov	fd7f87b55e	Supporting perf.py on Windows and some small corrections in it	2013-07-02 19:23:18 +04:00
Dmitry Babokin	8be4128c5a	Merge pull request #534 from ifilippov/perf add script for measuring performance	2013-07-01 05:09:03 -07:00
Ilia Filippov	806e37338c	add script for measuring performance	2013-07-01 13:30:49 +04:00
Dmitry Babokin	ec1095624a	Merge pull request #527 from tkoziara/master examples/sort added	2013-06-25 10:11:39 -07:00
Tomasz Koziara	a23d69ebe8	Copyright changed to simplify legal matters.	2013-06-25 17:28:27 +01:00
Tomasz Koziara	86ee8db778	Parallel prefix sum added + minor amendements.	2013-06-25 12:45:51 +01:00
Ilia Filippov	9fb981e9a0	correction of --instrument option support	2013-06-25 12:33:23 +04:00
Tomasz Koziara	f2452f040d	First commit of the radix sort example.	2013-06-24 18:37:44 +01:00
james.brodman	6211966c55	Change mask to use __mmask16 instead of a struct.	2013-05-30 16:04:44 -04:00
james.brodman	7b2eaf63af	knc.h cleanup	2013-05-10 13:36:18 -04:00
Dmitry Babokin	1069a3c77e	Removing some sources of warnings sse4.h and trailing spaces	2013-04-25 03:40:32 +04:00
james.brodman	52dcbf087a	Implemented 3 more intrinsics on double precision vectors	2013-03-28 11:55:53 -04:00
james.brodman	ef1af547e2	Change sse4.h to enable inlining.	2013-03-13 10:55:53 -04:00
Jean-Luc Duprat	24087ff3cc	Expose none() in the ISPC standard library. On KNC: all(), any() and none() do not generate a redundant movmsk instruction.	2012-11-27 13:38:28 -08:00
Jean-Luc Duprat	2129b1e27d	knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd() This was a typo.	2012-11-21 15:40:35 -08:00
Jean-Luc Duprat	d3b86dcc90	KNC: fix implementation of __all() to use KNCni mask test instructions...	2012-11-14 09:24:01 -08:00
Jean-Luc Duprat	b601331362	Approximation for inverse sqrt and reciprocal provided in fast math mode. RCP was actually slow in fast math mode Inverse sqrt did not expose fast approximation	2012-11-13 14:01:35 -08:00
james.brodman	97ddc1ed10	Fixed =/== error in __all()	2012-11-08 16:30:12 -05:00
jbrodman	e323b1d0ad	Fixed compile error: == instead of =	2012-10-26 16:55:28 -04:00
Matt Pharr	406fbab40e	Fix bugs in declarations of __any, __all, and __none in examples/intrinsics. They return bool, not vector of bool.	2012-10-17 10:55:50 -07:00
Matt Pharr	9002837750	Remove incorrect assert in tasksys.cpp	2012-10-15 10:43:46 -07:00
Matt Pharr	538d51cbfe	Add GMRES example	2012-09-20 14:06:55 -07:00
Jean-Luc Duprat	3dd9ff3d84	knc.h: Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor Minor cleanups knc2x.h Fixed usage of loadunpack and packstore to use proper memory offset Fixed implementation of __masked_load_() __masked_store_() incorrectly (un)packing the lanes loaded Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used __any() and __none() speedups. Cleaned up usage of _mm512_undefined_(), it is now mostly confined to constructor	2012-09-19 17:11:04 -07:00
Ingo Wald	7f386923b0	Merge branch 'master' of https://github.com/ispc/ispc	2012-09-17 15:54:25 +02:00
Ingo Wald	d2312b1fbd	now using the ASSUME_ALIGNED flag in knc.h	2012-09-17 15:54:00 +02:00
Ingo Wald	6655373ac3	commit test	2012-09-17 15:51:37 +02:00
Ingo Wald	d492af7bc0	64-bit gather/scatter, aligned load/store, i8 support	2012-09-17 03:39:02 +02:00
Jean-Luc Duprat	0e88d5f97f	Fixed unaligned masked stores on KNC	2012-09-14 14:11:41 -07:00
Jean-Luc Duprat	f0b0618484	Added the following mask tests: __any(), __all(), __none() for all supported targets. This allows for more efficient code generation of KNC.	2012-09-14 11:06:18 -07:00

1 2 3 4

181 Commits