aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Vsevolod Livinskij	65768c20ae	Added tests for saturation and some fixes for generic and avx target	2013-12-05 00:34:14 +04:00
Vsevolod Livinskij	4faff1a63c	structural change	2013-11-30 10:48:18 +04:00
Vsevolod Livinskij	4c330bc38b	Add code generation of saturation	2013-11-29 18:40:04 +04:00
Vsevolod Livinskij	bec6662338	Some cganges for avx1 and avx1.1 in saturation	2013-11-29 03:45:25 +04:00
Vsevolod Livinskij	42c148bf75	Changes for sse2 and sse4 in saturation	2013-11-29 03:33:40 +04:00
Vsevolod Livinskij	35a4d1b3a2	Add some AVX2 intrinsics	2013-11-27 00:55:57 +04:00
Vsevolod Livinskij	19f73b2ede	uniform signed/unsigned int8/16	2013-11-25 19:16:02 +04:00
Dmitry Babokin	d2c7b356cc	Ordering functions in target-[avx\|sse2].ll to be in the same order. No real changes, except adding a few alwaysinline in SSE4 target	2013-11-14 15:34:30 +04:00
Dmitry Babokin	af58955140	target-[sse4\|avx]_common.ll are twin brothers, which diffes only cosmetically. This commit makes them diffable. No real changes, except adding alwaysinline to sse version iof __max_uniform_int32/__max_uniform_uint32	2013-11-14 15:34:30 +04:00
Dmitry Babokin	6585a925be	Merge pull request #641 from jbrodman/stdlibshift Add a "shift" operator to the stdlib.	2013-10-28 14:18:31 -07:00
james.brodman	4d289b16c2	Redesign after being hit with the KISS bat.	2013-10-23 14:25:43 -04:00
james.brodman	f97a2d68c8	Bugfix for non-const shift amt and unit tests.	2013-10-22 18:29:20 -04:00
james.brodman	899f85ce9c	Initial Support for new stdlib shift operator	2013-10-22 18:06:54 -04:00
egaburov	1710b9171f	removed LLVM_3_0 legacy part and changed copyright to 2013	2013-10-18 08:53:01 +02:00
egaburov	7e9b4c0924	added avx2-i64x4 and avx1.1-i64x4 targets	2013-10-15 10:02:10 +02:00
Ilia Filippov	92773ada6d	fix for ISPC for compfails at sse4-i8 and sse4-i16	2013-10-11 15:23:40 +04:00
Dmitry Babokin	43245bbc11	Adding check for OS AVX support to auto-dispatch code	2013-09-19 15:39:56 +04:00
Evghenii	9861375f0c	renamed avx-i64x4 -> avx1-i64x4	2013-09-13 15:07:14 +02:00
Evghenii	059d80cc11	included suggested changes, ./tests/launch-*.ispc still fails. something is mask64 related, not sure what. help...	2013-09-12 17:18:12 +02:00
egaburov	7364e06387	added mask64	2013-09-12 12:02:42 +02:00
egaburov	efc20c2110	added svml support to all sse/avx modes	2013-09-11 17:07:54 +02:00
egaburov	19379db3b6	svml cleanup	2013-09-11 16:48:56 +02:00
egaburov	7a32699573	added svml.m4	2013-09-11 15:18:03 +02:00
egaburov	320c41ffcf	added svml support. experimental. for some reason all sybmols are visible..	2013-09-11 15:16:50 +02:00
egaburov	9c79d4d182	addded avxh with vectorWidth=4 support, use --target=avxh to enable it	2013-09-11 12:58:02 +02:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Matt Pharr	5b20b06bd9	Add avg_{up,down}_int{8,16} routines to stdlib These compute the average of two given values, rounding up and down, respectively, if the result isn't exact. When possible, these are mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US] on NEON.) A subsequent commit will add pattern-matching to generate calls to these intrinsincs when the corresponding patterns are detected in the IR.)	2013-08-06 08:41:12 -07:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	48ff03112f	Remove __pause from stdlib_core() in utils.m4. It wasn't ever being used, and was breaking compilation on ARM.	2013-07-30 08:44:22 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	2d063925a1	Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8. This is slightly cleaner than trunc-ing the i8 mask to i1 and using a vector select. (And is probably more safe in terms of good code.)	2013-07-25 09:46:01 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	15a3ef370a	Use @llvm.readcyclecounter to implement stdlib clock() function. Also added a test for the clock builtin.	2013-07-23 17:24:57 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	7bedb4a081	Add memory alignment dependant on the platform (16/32/64/etc)	2013-05-24 10:29:01 +04:00
Dmitry Babokin	630215f56f	Defining memory routines completely separately for Windows/Unix 32/64 bit.	2013-05-24 10:29:01 +04:00
Dmitry Babokin	5362dade37	Fixing util.m4 to declare nothing unless some macro is instantiated	2013-05-24 10:29:00 +04:00
Dmitry Babokin	f22e237381	Minor fix for generic DataLayout	2013-05-13 20:24:51 +04:00
Dmitry Babokin	a47460b4c3	Efficient library implementation of broadcast	2013-05-02 00:12:16 +02:00
jbrodman	018e9a12a3	Merge pull request #484 from dbabokin/malloc Fix for aligned move of unaligned data in 32 bit platforms.	2013-04-30 12:02:04 -07:00
Dmitry Babokin	26bec62daf	Removing duplicating free defintion on Linux	2013-04-27 00:29:51 +04:00
Dmitry Babokin	7497e86902	Adding Windows support for aligned memory allocation on Windows	2013-04-26 22:07:30 +02:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	d36ab4cc3c	Adding noalias attribute to malloc return	2013-04-25 20:39:01 +04:00
Dmitry Babokin	e756daa261	Remove sprintf warnings on Windows and fix sprintf-related fails on Mac	2013-04-24 22:36:48 +02:00

1 2 3

110 Commits