aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Ilia Filippov	3fd9d5a025	support of LLVM 3.5	2013-11-21 19:09:43 +04:00
james.brodman	0f7050d3aa	More stds compliant. VS doesn't like non constant length local arrays.	2013-10-31 19:51:13 -04:00
james.brodman	85eb4cf0d6	Fix logic that looks for shift builtins.	2013-10-29 14:02:32 -04:00
james.brodman	8ee3178166	Add Performance Warning	2013-10-28 16:51:02 -04:00
james.brodman	09a6e37154	Source cleanup.	2013-10-28 16:37:33 -04:00
james.brodman	1b8e745ffe	remove condition. Don't use gcc 4.7 for tests.	2013-10-28 16:36:59 -04:00
james.brodman	9ba7b96825	Make the new optimization play nicely with the other.s	2013-10-28 16:14:31 -04:00
james.brodman	d2b89e0e37	Tweak generic target.	2013-10-23 18:01:01 -04:00
james.brodman	4d289b16c2	Redesign after being hit with the KISS bat.	2013-10-23 14:25:43 -04:00
james.brodman	899f85ce9c	Initial Support for new stdlib shift operator	2013-10-22 18:06:54 -04:00
Matt Pharr	7ab4c5391c	Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.	2013-08-09 19:56:43 -07:00
Matt Pharr	1d76f74b16	Fix compiler warnings	2013-08-07 12:53:39 -07:00
Matt Pharr	5e5d42b918	Fix build with LLVM 3.1	2013-08-06 17:55:37 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Matt Pharr	ccdbddd388	Add peephole optimization to match int8/int16 averages. Match the following patterns in IR, turning them into target-specific intrinsics (e.g. PAVGB on x86) when possible. (unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2) (unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2) (unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2) (unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2) (int8)(((int16)a + (int16)b + 1)/2) (int8)(((int16)a + (int16)b)/2) (int16)(((int32)a + (int32)b + 1)/2) (int16)(((int32)a + (int32)b)/2)	2013-08-06 08:59:46 -07:00
Matt Pharr	5b20b06bd9	Add avg_{up,down}_int{8,16} routines to stdlib These compute the average of two given values, rounding up and down, respectively, if the result isn't exact. When possible, these are mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US] on NEON.) A subsequent commit will add pattern-matching to generate calls to these intrinsincs when the corresponding patterns are detected in the IR.)	2013-08-06 08:41:12 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	bba84f247c	Improved optimization of vector select instructions. Various LLVM optimization passes are turning code like: %cmp = icmp lt <8 x i32> %foo, %bar %cmp32 = sext <8 x i1> %cmp to <8 x i32> . . . %cmp1 = trunc <8 x i32> %cmp32 to <8 x i1> %result = select <8 x i1> %cmp1, . . . Into: %cmp = icmp lt <8 x i32> %foo, %bar %cmp32 = zext <8 x i1> %cmp to <8 x i32> # note: zext . . . %cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer %result = select <8 x i1> %cmp1, … Which in turn isn't matched well by the LLVM code generators, which in turn leads to fairly inefficient code. (i.e. it doesn't just emit a vector compare and blend instruction.) Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better describe its functionality.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Dmitry Babokin	fdcec5a219	Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call	2013-06-24 10:08:06 +04:00
Ilia Filippov	d92f9df17c	changes in function LLVMFlattenInsertChain	2013-06-14 15:21:45 +04:00
Dmitry Babokin	eb2e5f378c	Comment fixes	2013-04-18 15:36:35 +04:00
Dmitry Babokin	cb650d6100	One more opportunity to do better broadcast	2013-04-17 20:56:32 +04:00
Dmitry Babokin	5898532605	Broadcast implementation as InsertElement+Shuffle and related improvements	2013-04-10 02:18:24 +04:00
james.brodman	0a3822f2e5	Fix to make sure we're generating 32-bit gather/scatter when force32bitaddressing is set.	2013-04-05 16:21:05 -04:00
Dmitry Babokin	0af2a13349	DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout	2013-03-23 14:38:51 +04:00
Dmitry Babokin	0f86255279	Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets).	2013-03-23 14:28:05 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
james.brodman	3aaf2ef2d4	ToT Fixes / M4 macro fix	2013-01-14 14:55:10 -05:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Matt Pharr	81dbd504aa	Small fixes to eliminate compiler warnings when using clang	2013-01-06 12:10:54 -08:00
Matt Pharr	63dd7d9859	Fix build to work with LLVM top-of-tree again	2013-01-06 12:02:08 -08:00
ptu1	810784da1f	Set the ScalarReplAggregate maximum structure size based on target vector width.	2012-11-13 12:35:45 -08:00
Matt Pharr	172a189c6f	Fix build with LLVM top-of-tree	2012-10-17 11:11:50 -07:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Matt Pharr	daf5aa8e8b	Run inst combine before memory optimizations. We were previously emitting 64-bit indexing for some gathers where 32-bit was actually fine, due to some adds of constant vectors that hadn't been simplified to the result.	2012-07-13 12:14:53 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	ec0280be11	Rename gather/scatter_base_offsets functions to factored_based_offsets. No functional change; just preparation for having a path that doesn't factor the offsets into constant and varying parts, which will be better for AVX2 and KNC.	2012-07-11 14:16:39 -07:00
Matt Pharr	f38770bf2a	Fix build with LLVM ToT	2012-06-28 07:36:10 -07:00
Matt Pharr	ada66b5313	Make more attempts to pull out constant offsets for gather/scatter. The "base+offsets" variants of gather decompose the integer offsets into compile-time constant and compile-time unknown elements. (The coalescing optimization, then, depends on this decomposition being done well--having as much as possible in the constant component.) We now make multiple efforts to improve this decomposition as we run optimization passes; in some cases we're able to move more over to the constant side than was first possible. This in particular fixes issue #276, a case where coalescing was expected but didn't actually happen.	2012-06-12 16:21:14 -07:00
Matt Pharr	96450e17a3	Do all memory op improvements in a single optimization pass. Rather than having separate passes to do conversion, when possible, of: - General gather/scatter of a vector of pointers to g/s of a base pointer and integer offsets - Gather/scatter to masked load/store, load+broadcast - Masked load/store to regular load/store Now all are done in a single ImproveMemoryOps pass. This change was in particular to address some phase ordering issues that showed up with multidimensional array access wherein after determining that an outer dimension had the same index value, we previously weren't able to take advantage of the uniformity of the resulting pointer.	2012-06-12 13:56:17 -07:00
Matt Pharr	d6c6f95373	Do all replacements of __pseudo* memory ops in a single optimization pass. Collected the old PseudoGSToGSPass and PseudoMaskedStorePass into a single pass, ReplacePseudoMemoryOpsPass, which handles both of their tasks.	2012-06-12 13:10:03 -07:00
Matt Pharr	19b46be20d	Remove load_and_broadcast from built-ins. Now that we never ever run with the mask all off, we no longer need that logic in a built-in function so that we can check the mask. In the one place where it was used (turning gathers to the same location into a load and broadcast), we now just emit the code for that directly.	2012-06-12 12:30:57 -07:00
Matt Pharr	28a821df7d	Improve wording of gather/scatter performance warnings.	2012-06-08 13:32:57 -07:00
Matt Pharr	89a2566e01	Add separate variants of memory built-ins for floats and doubles. Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have separate float/double variants of each of those.	2012-06-07 14:47:16 -07:00
Matt Pharr	1ac3e03171	Gather/scatter function improvements in builtins. More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.	2012-06-07 14:19:23 -07:00
Matt Pharr	b86d40091a	Improve naming of masked load/store instructions in builtins. Now, use _i32 suffixes, rather than _32, etc. Also cleaned up the m4 macro to generate these functions, using WIDTH to get the target width, etc.	2012-06-07 13:58:31 -07:00
Matt Pharr	91d22d150f	Update load_and_broadcast built-in Change function suffix to "_i32", etc, from "_32" Improve load_and_broadcast macro in util.m4 to grab vector width from WIDTH variable rather than taking it as a parameter.	2012-06-07 13:33:17 -07:00

1 2 3 4

166 Commits