aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Vsevolod Livinskiy	e0f0520c1f	Fix for llvm revision 216488	2014-08-28 12:59:03 +04:00
Anton Mitrokhin	60fa76ccc1	reversed macros LLVM_3_6 to LLVM_3_5+ in .cpp and .h files	2014-08-01 15:40:48 +04:00
Anton Mitrokhin	d0c9b7c9b5	wiped out all LLVM 3.1 support	2014-08-01 14:54:08 +04:00
Anton Mitrokhin	725be222ac	added LLVM_3_6 var	2014-07-30 11:50:15 +04:00
Ilia Filippov	76ea59b40b	support LLVM build	2014-06-18 17:53:42 +04:00
Dmitry Babokin	31b95b665b	Copyright update	2014-03-12 20:19:16 +04:00
Ilia Filippov	47f7900cd3	support LLVM trunk	2014-03-07 16:28:56 +04:00
Ilia Filippov	06c06456c4	support LLVM trunk after r202168 r202190 revisions	2014-02-26 17:06:58 +04:00
Dmitry Babokin	5794d18737	Merge pull request #745 from egaburov/native_trigonometry added transcdentals/trigonometry to builtins	2014-02-21 11:15:08 +03:00
evghenii	e2d68e6119	added transcdentals/trigonometry to builtins	2014-02-21 08:17:40 +01:00
Dmitry Babokin	f280b32fa4	Merge pull request #736 from egaburov/native_trigonometry Native trigonometry	2014-02-20 19:18:35 +03:00
Vsevolod Livinskij	cef5b2eb04	Some changes in saturation arithmetic	2014-02-10 12:40:53 +04:00
Evghenii	668645fcda	first commit	2014-02-07 11:05:36 +01:00
Evghenii	d3a6693eef	adding __have_native_{rsqrtd,rcpd} to select between native support for double precision reciprocals and using slower but safe version in stdlib	2014-02-04 16:29:23 +01:00
evghenii	3a72e05c3e	+1	2014-02-02 18:16:48 +01:00
Ilia Filippov	aa31957d84	supporting LLVM trunk	2014-01-21 14:21:26 +04:00
Vsevolod Livinskij	da02236b3a	Scalar realization of no-vec functions was replaced from builtins to stdlib.ispc.	2014-01-20 16:06:34 +04:00
Ilia Filippov	473f1cb4d2	packed_store_active2	2013-12-17 21:14:29 +04:00
Vsevolod Livinskij	35a4d1b3a2	Add some AVX2 intrinsics	2013-11-27 00:55:57 +04:00
Vsevolod Livinskij	19f73b2ede	uniform signed/unsigned int8/16	2013-11-25 19:16:02 +04:00
Dmitry Babokin	65ea6fd48a	Reasoning to use sse4 bitcode file	2013-11-14 15:34:30 +04:00
Dmitry Babokin	ffc9a33933	avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag	2013-11-14 15:34:30 +04:00
Dmitry Babokin	6585a925be	Merge pull request #641 from jbrodman/stdlibshift Add a "shift" operator to the stdlib.	2013-10-28 14:18:31 -07:00
james.brodman	899f85ce9c	Initial Support for new stdlib shift operator	2013-10-22 18:06:54 -04:00
egaburov	7e9b4c0924	added avx2-i64x4 and avx1.1-i64x4 targets	2013-10-15 10:02:10 +02:00
Evghenii	9861375f0c	renamed avx-i64x4 -> avx1-i64x4	2013-09-13 15:07:14 +02:00
egaburov	7364e06387	added mask64	2013-09-12 12:02:42 +02:00
egaburov	9cf8e8cbf3	builtins fix for double precision svml and __stdlib_asin	2013-09-11 15:23:45 +02:00
egaburov	320c41ffcf	added svml support. experimental. for some reason all sybmols are visible..	2013-09-11 15:16:50 +02:00
egaburov	9c79d4d182	addded avxh with vectorWidth=4 support, use --target=avxh to enable it	2013-09-11 12:58:02 +02:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	48ff03112f	Remove __pause from stdlib_core() in utils.m4. It wasn't ever being used, and was breaking compilation on ARM.	2013-07-30 08:44:22 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
james.brodman	d8b5fd5409	Typo fix.	2013-05-28 11:13:43 -04:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	4b388edca9	Splitting .ll files to be compiled in two versions - 32 and 64 bit. Unix only	2013-05-24 10:29:00 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	32be338f60	Minor indentation fix	2013-05-02 00:05:17 +02:00
Dmitry Babokin	7497e86902	Adding Windows support for aligned memory allocation on Windows	2013-04-26 22:07:30 +02:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00

1 2 3

124 Commits