aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	15a3ef370a	Use @llvm.readcyclecounter to implement stdlib clock() function. Also added a test for the clock builtin.	2013-07-23 17:24:57 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Dmitry Babokin	7bedb4a081	Add memory alignment dependant on the platform (16/32/64/etc)	2013-05-24 10:29:01 +04:00
Dmitry Babokin	630215f56f	Defining memory routines completely separately for Windows/Unix 32/64 bit.	2013-05-24 10:29:01 +04:00
Dmitry Babokin	5362dade37	Fixing util.m4 to declare nothing unless some macro is instantiated	2013-05-24 10:29:00 +04:00
Dmitry Babokin	a47460b4c3	Efficient library implementation of broadcast	2013-05-02 00:12:16 +02:00
Dmitry Babokin	26bec62daf	Removing duplicating free defintion on Linux	2013-04-27 00:29:51 +04:00
Dmitry Babokin	7497e86902	Adding Windows support for aligned memory allocation on Windows	2013-04-26 22:07:30 +02:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	d36ab4cc3c	Adding noalias attribute to malloc return	2013-04-25 20:39:01 +04:00
james.brodman	3aaf2ef2d4	ToT Fixes / M4 macro fix	2013-01-14 14:55:10 -05:00
Matt Pharr	765a0d8896	Use puts() rather than printf() for printing assertion failure strings. This way, we don't lose '%'s in the assertion strings. Issue #342.	2012-08-03 11:31:38 -07:00
Matt Pharr	6a410fc30e	Emit gather instructions for the AVX2 targets. Issue #308.	2012-07-13 12:29:05 -07:00
Matt Pharr	984a68c3a9	Rename gen_gather() macro to gen_gather_factored()	2012-07-13 12:24:12 -07:00
Matt Pharr	2c640f7e52	Add support for RDRAND in IvyBridge. The standard library now provides a variety of rdrand() functions that call out to RDRAND, when available. Issue #263.	2012-07-12 06:07:07 -07:00
Matt Pharr	c09c87873e	Whitespace / indentation fixes.	2012-07-11 14:29:46 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	ec0280be11	Rename gather/scatter_base_offsets functions to factored_based_offsets. No functional change; just preparation for having a path that doesn't factor the offsets into constant and varying parts, which will be better for AVX2 and KNC.	2012-07-11 14:16:39 -07:00
Matt Pharr	fb8b893b10	Fix incorrect LLVM_3_1svn tests. 1. For some time now, we provide the version without the 'svn' 2. We should be testing "not LLVM 3.0" in these cases, since they apply to LLVM 3.2 and beyond as well...	2012-07-09 07:09:25 -07:00
Matt Pharr	9ca80debb8	Remove stale LLVM 2.9 support from builtins/util.m4	2012-07-09 06:54:29 -07:00
Matt Pharr	d34a87404d	Provide (undocumented for now) __pause() call to emit PAUSE inst.	2012-06-28 09:28:25 -07:00
Matt Pharr	19b46be20d	Remove load_and_broadcast from built-ins. Now that we never ever run with the mask all off, we no longer need that logic in a built-in function so that we can check the mask. In the one place where it was used (turning gathers to the same location into a load and broadcast), we now just emit the code for that directly.	2012-06-12 12:30:57 -07:00
Matt Pharr	89a2566e01	Add separate variants of memory built-ins for floats and doubles. Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have separate float/double variants of each of those.	2012-06-07 14:47:16 -07:00
Matt Pharr	1ac3e03171	Gather/scatter function improvements in builtins. More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.	2012-06-07 14:19:23 -07:00
Matt Pharr	b86d40091a	Improve naming of masked load/store instructions in builtins. Now, use _i32 suffixes, rather than _32, etc. Also cleaned up the m4 macro to generate these functions, using WIDTH to get the target width, etc.	2012-06-07 13:58:31 -07:00
Matt Pharr	91d22d150f	Update load_and_broadcast built-in Change function suffix to "_i32", etc, from "_32" Improve load_and_broadcast macro in util.m4 to grab vector width from WIDTH variable rather than taking it as a parameter.	2012-06-07 13:33:17 -07:00
Matt Pharr	1d29991268	Indentation fixes in builtins/	2012-06-07 13:23:07 -07:00
Matt Pharr	90db01d038	Represent MOVMSK'ed masks with int64s rather than int32s. This allows us to scale up to 64-wide execution.	2012-05-25 11:57:23 -07:00
Matt Pharr	ee7e367981	Do global dead code elimination early in optimization. This gives a 15-20% speedup in compilation time for simple programs (but only ~2% for the big 21k monster program).	2012-05-05 15:47:19 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
Matt Pharr	fd846fbe77	Fix bug in __gather_base_offsets_32. In short, we weren't correctly zeroing the compile-time constant portion of the offsets for lanes that aren't executing. (!) Fixes issue #235.	2012-04-12 10:28:15 -07:00
Matt Pharr	3b95452481	Add memcpy(), memmove() and memset() to the standard library. Issue #183.	2012-03-05 16:09:00 -08:00
Matt Pharr	c152ae3c32	Add single-precision asin() and acos() to stdlib. Issue #184.	2012-03-05 13:32:13 -08:00
Matt Pharr	5b4673e8eb	Fix build with LLVM 2.9.	2012-02-07 08:37:13 -08:00
Matt Pharr	664dc3bdda	Add support for "new" and "delete" to the language. Issue #139.	2012-01-27 14:47:06 -08:00
Matt Pharr	24f58fa16a	Update per_lane macro to not use ID for lane number in macro expansion This was leading to unintended consequences if WIDTH was used in macro code, which was undesirable.	2012-01-27 09:12:13 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	d805e8b183	Add clock() function to standard library. Also corrected the declaration of num_cores() to return a uniform value.	2012-01-22 13:05:27 -08:00
Matt Pharr	1bba9d4307	Improve atomic_swap_global() to take advantage of associativity. We now do a single atomic hardware swap and then effectively do swaps between the running program instances such that the result is the same as if they had happened to run a particular ordering of hardware swaps themselves. Also cleaned up __atomic_swap_uniform_* built-in implementations to not take the mask, which they weren't using anyway. Finishes Issue #56.	2012-01-20 10:37:33 -08:00
Matt Pharr	234e5cd3e1	Use vector select for masked store blend if building with LLVM3.1	2012-01-04 12:59:03 -08:00
Matt Pharr	f75c94a8f1	Have aos/soa and broadcast/shuffle/rotate functions provided by the target. The SSE/AVX targets use the old versions from util.m4, but these functions are now passed through to the generic targets.	2012-01-04 12:59:03 -08:00
Matt Pharr	848a432640	Fix various small things that were broken with single-bit-per-lane masks. Also small cleanups to declarations, "no captures" added, etc.	2012-01-04 12:59:03 -08:00
Matt Pharr	562d61caff	Added masked load optimization pass. This pass handles the "all on" and "all off" mask cases appropriately. Also renamed load_masked stuff in built-ins to masked_load for consistency with masked_store.	2012-01-04 11:51:26 -08:00
Matt Pharr	1d9201fe3d	Add "generic" 4, 8, and 16-wide targets. When used, these targets end up with calls to undefined functions for all of the various special vector stuff ispc needs to compile ispc programs (masked store, gather, min/max, sqrt, etc.). These targets are not yet useful for anything, but are a step toward having an option to C++ code with calls out to intrinsics. Reorganized the directory structure a bit and put the LLVM bitcode used to define target-specific stuff (as well as some generic built-ins stuff) into a builtins/ directory. Note that for building on Windows, it's now necessary to set a LLVM_VERSION environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)	2011-12-19 13:46:50 -08:00

46 Commits