aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Evghenii	9861375f0c	renamed avx-i64x4 -> avx1-i64x4	2013-09-13 15:07:14 +02:00
Evghenii	059d80cc11	included suggested changes, ./tests/launch-*.ispc still fails. something is mask64 related, not sure what. help...	2013-09-12 17:18:12 +02:00
egaburov	7364e06387	added mask64	2013-09-12 12:02:42 +02:00
egaburov	efc20c2110	added svml support to all sse/avx modes	2013-09-11 17:07:54 +02:00
egaburov	19379db3b6	svml cleanup	2013-09-11 16:48:56 +02:00
egaburov	7a32699573	added svml.m4	2013-09-11 15:18:03 +02:00
egaburov	320c41ffcf	added svml support. experimental. for some reason all sybmols are visible..	2013-09-11 15:16:50 +02:00
egaburov	9c79d4d182	addded avxh with vectorWidth=4 support, use --target=avxh to enable it	2013-09-11 12:58:02 +02:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Matt Pharr	5b20b06bd9	Add avg_{up,down}_int{8,16} routines to stdlib These compute the average of two given values, rounding up and down, respectively, if the result isn't exact. When possible, these are mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US] on NEON.) A subsequent commit will add pattern-matching to generate calls to these intrinsincs when the corresponding patterns are detected in the IR.)	2013-08-06 08:41:12 -07:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	48ff03112f	Remove __pause from stdlib_core() in utils.m4. It wasn't ever being used, and was breaking compilation on ARM.	2013-07-30 08:44:22 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	2d063925a1	Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8. This is slightly cleaner than trunc-ing the i8 mask to i1 and using a vector select. (And is probably more safe in terms of good code.)	2013-07-25 09:46:01 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	15a3ef370a	Use @llvm.readcyclecounter to implement stdlib clock() function. Also added a test for the clock builtin.	2013-07-23 17:24:57 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	7bedb4a081	Add memory alignment dependant on the platform (16/32/64/etc)	2013-05-24 10:29:01 +04:00
Dmitry Babokin	630215f56f	Defining memory routines completely separately for Windows/Unix 32/64 bit.	2013-05-24 10:29:01 +04:00
Dmitry Babokin	5362dade37	Fixing util.m4 to declare nothing unless some macro is instantiated	2013-05-24 10:29:00 +04:00
Dmitry Babokin	f22e237381	Minor fix for generic DataLayout	2013-05-13 20:24:51 +04:00
Dmitry Babokin	a47460b4c3	Efficient library implementation of broadcast	2013-05-02 00:12:16 +02:00
jbrodman	018e9a12a3	Merge pull request #484 from dbabokin/malloc Fix for aligned move of unaligned data in 32 bit platforms.	2013-04-30 12:02:04 -07:00
Dmitry Babokin	26bec62daf	Removing duplicating free defintion on Linux	2013-04-27 00:29:51 +04:00
Dmitry Babokin	7497e86902	Adding Windows support for aligned memory allocation on Windows	2013-04-26 22:07:30 +02:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	d36ab4cc3c	Adding noalias attribute to malloc return	2013-04-25 20:39:01 +04:00
Dmitry Babokin	e756daa261	Remove sprintf warnings on Windows and fix sprintf-related fails on Mac	2013-04-24 22:36:48 +02:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
james.brodman	3aaf2ef2d4	ToT Fixes / M4 macro fix	2013-01-14 14:55:10 -05:00
Matt Pharr	6412876f64	Remove unused __reduce_add_uint{32,64} target functions. The stdilb code just calls the signed int{32,64} functions, which gives the right result for the unsigned case anyway. The various targets didn't consistently define the unsigned variants in any case.	2012-09-28 05:55:41 -07:00
Jean-Luc Duprat	f0b0618484	Added the following mask tests: __any(), __all(), __none() for all supported targets. This allows for more efficient code generation of KNC.	2012-09-14 11:06:18 -07:00
Matt Pharr	49dde7c6f2	Fix bug in declaration of double-precision sqrt intrinsic for AVX targets. This was preventing sqrts of uniform double values from being compiled properly. Issue #344.	2012-08-03 11:43:31 -07:00
Matt Pharr	765a0d8896	Use puts() rather than printf() for printing assertion failure strings. This way, we don't lose '%'s in the assertion strings. Issue #342.	2012-08-03 11:31:38 -07:00
Matt Pharr	6a410fc30e	Emit gather instructions for the AVX2 targets. Issue #308.	2012-07-13 12:29:05 -07:00
Matt Pharr	984a68c3a9	Rename gen_gather() macro to gen_gather_factored()	2012-07-13 12:24:12 -07:00
Matt Pharr	98b2e0e426	Fixes for intrinsics unsupported in earlier LLVM versions. Specifically, don't use the half/float conversion routines with LLVM 3.0, and don't try to use RDRAND with anything before LLVM 3.2.	2012-07-13 12:14:10 -07:00
Matt Pharr	371d4be8ef	Fix bugs in detection of Ivy Bridge systems. We were incorrectly characterizing them as basic AVX1 without further extensions, due to a bug in the logic to check CPU features.	2012-07-12 14:11:15 -07:00
Matt Pharr	2c640f7e52	Add support for RDRAND in IvyBridge. The standard library now provides a variety of rdrand() functions that call out to RDRAND, when available. Issue #263.	2012-07-12 06:07:07 -07:00
Matt Pharr	216ac4b1a4	Stop factoring out constant offsets for gather/scatter if instr is available. For KNC (gather/scatter), it's not helpful to factor base+offsets gathers and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets. Now, if a HW instruction is available for gather/scatter, we just factor into base + {1/2/4/8} * offsets (if possible). Not only is this simpler, but it's also what we need to pass a value along to the scale by 2/4/8 available directly in those instructions. Finishes issue #325.	2012-07-11 14:52:29 -07:00
Matt Pharr	c09c87873e	Whitespace / indentation fixes.	2012-07-11 14:29:46 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	ec0280be11	Rename gather/scatter_base_offsets functions to factored_based_offsets. No functional change; just preparation for having a path that doesn't factor the offsets into constant and varying parts, which will be better for AVX2 and KNC.	2012-07-11 14:16:39 -07:00
Jean-Luc Duprat	098277b4f0	Merge pull request #321 from mmp/setzero More varied support for constant vectors from C++ backend.	2012-07-09 08:57:05 -07:00
Matt Pharr	fb8b893b10	Fix incorrect LLVM_3_1svn tests. 1. For some time now, we provide the version without the 'svn' 2. We should be testing "not LLVM 3.0" in these cases, since they apply to LLVM 3.2 and beyond as well...	2012-07-09 07:09:25 -07:00

1 2

93 Commits