aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	daf5aa8e8b	Run inst combine before memory optimizations. We were previously emitting 64-bit indexing for some gathers where 32-bit was actually fine, due to some adds of constant vectors that hadn't been simplified to the result.	2012-07-13 12:14:53 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	ec0280be11	Rename gather/scatter_base_offsets functions to factored_based_offsets. No functional change; just preparation for having a path that doesn't factor the offsets into constant and varying parts, which will be better for AVX2 and KNC.	2012-07-11 14:16:39 -07:00
Matt Pharr	f38770bf2a	Fix build with LLVM ToT	2012-06-28 07:36:10 -07:00
Matt Pharr	ada66b5313	Make more attempts to pull out constant offsets for gather/scatter. The "base+offsets" variants of gather decompose the integer offsets into compile-time constant and compile-time unknown elements. (The coalescing optimization, then, depends on this decomposition being done well--having as much as possible in the constant component.) We now make multiple efforts to improve this decomposition as we run optimization passes; in some cases we're able to move more over to the constant side than was first possible. This in particular fixes issue #276, a case where coalescing was expected but didn't actually happen.	2012-06-12 16:21:14 -07:00
Matt Pharr	96450e17a3	Do all memory op improvements in a single optimization pass. Rather than having separate passes to do conversion, when possible, of: - General gather/scatter of a vector of pointers to g/s of a base pointer and integer offsets - Gather/scatter to masked load/store, load+broadcast - Masked load/store to regular load/store Now all are done in a single ImproveMemoryOps pass. This change was in particular to address some phase ordering issues that showed up with multidimensional array access wherein after determining that an outer dimension had the same index value, we previously weren't able to take advantage of the uniformity of the resulting pointer.	2012-06-12 13:56:17 -07:00
Matt Pharr	d6c6f95373	Do all replacements of __pseudo* memory ops in a single optimization pass. Collected the old PseudoGSToGSPass and PseudoMaskedStorePass into a single pass, ReplacePseudoMemoryOpsPass, which handles both of their tasks.	2012-06-12 13:10:03 -07:00
Matt Pharr	19b46be20d	Remove load_and_broadcast from built-ins. Now that we never ever run with the mask all off, we no longer need that logic in a built-in function so that we can check the mask. In the one place where it was used (turning gathers to the same location into a load and broadcast), we now just emit the code for that directly.	2012-06-12 12:30:57 -07:00
Matt Pharr	28a821df7d	Improve wording of gather/scatter performance warnings.	2012-06-08 13:32:57 -07:00
Matt Pharr	89a2566e01	Add separate variants of memory built-ins for floats and doubles. Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then use the i32 variant of masked_load/masked_store/gather/scatter. Now, we have separate float/double variants of each of those.	2012-06-07 14:47:16 -07:00
Matt Pharr	1ac3e03171	Gather/scatter function improvements in builtins. More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.	2012-06-07 14:19:23 -07:00
Matt Pharr	b86d40091a	Improve naming of masked load/store instructions in builtins. Now, use _i32 suffixes, rather than _32, etc. Also cleaned up the m4 macro to generate these functions, using WIDTH to get the target width, etc.	2012-06-07 13:58:31 -07:00
Matt Pharr	91d22d150f	Update load_and_broadcast built-in Change function suffix to "_i32", etc, from "_32" Improve load_and_broadcast macro in util.m4 to grab vector width from WIDTH variable rather than taking it as a parameter.	2012-06-07 13:33:17 -07:00
Matt Pharr	90db01d038	Represent MOVMSK'ed masks with int64s rather than int32s. This allows us to scale up to 64-wide execution.	2012-05-25 11:57:23 -07:00
Nipunn Koorapati	041ade66d5	Placated compiler by initializing variable	2012-05-06 06:59:17 -07:00
Matt Pharr	55c754750e	Remove a number of redundant/unneeded optimization passes. Performance and code quality of performance suite is unchanged, compilation times are improved by another 20% or so for simple programs (e.g. rt.ispc). One very complex programs compiles about 2.4x faster now.	2012-05-05 15:47:24 -07:00
Matt Pharr	72b6c12856	Notify LLVM pass mgr that the MakeInternalFuncsStaticPass doesn't change the CFG.	2012-05-05 15:47:24 -07:00
Matt Pharr	ee7e367981	Do global dead code elimination early in optimization. This gives a 15-20% speedup in compilation time for simple programs (but only ~2% for the big 21k monster program).	2012-05-05 15:47:19 -07:00
Matt Pharr	d99bd279e8	Add generic-32 target.	2012-05-03 11:11:06 -07:00
Matt Pharr	ee1fe3aa9f	Update build to handle existence of LLVM 3.2 dev branch. We now compile with LLVM 3.0, 3.1, and 3.2svn.	2012-05-03 08:25:25 -07:00
Matt Pharr	12706cd37f	Debugging optimization pass updates Don't run mem2reg with -O0 anymore, but do run the intrinsics opt pass, which allows some CFG simplification due to the mask being all on, etc.	2012-04-25 08:43:11 -10:00
Matt Pharr	32815e628d	Improve naming of llvm Instructions created. We now try harder to keep the names of instructions related to the initial names of variables they're derived from and so forth. This is useful for making both LLVM IR as well as generated C++ code easier to correlate back to the original ispc source code. Issue #244.	2012-04-19 16:36:46 -07:00
Matt Pharr	326c45fa17	Fix bugs in LLVMExtractFirstVectorElement(). When we're manually scalarizing the extraction of the first element of a vector value, we need to be careful about handling constant values and about where new instructions are inserted. The old code was sloppy about this, which in turn lead to invalid IR in some cases. For example, the two bugs below were essentially due to generating an extractelement inst from a zeroinitializer value and then inserting it in the wrong bblock such that a phi node that used that value was malformed. Fixes issues #240 and #229.	2012-04-19 09:45:04 -07:00
Matt Pharr	a2bb899a6b	Opt debug printing improvement Now, just match the prefix of the provided function name of interest, which allows us to not worry about managing details.	2012-04-19 09:34:54 -07:00
Matt Pharr	9fedb1674e	Improve basic block dumping from optimization passes. Now done via a macro, which is cleaner. It's also now possible to specify a single function to watch, which is useful for debugging.	2012-04-18 15:46:18 -07:00
Matt Pharr	7c91b01125	Handle more forms of constant vectors in lGetMask(). Various optimization passes depend on turning a compile-time constant mask into a bit vector; it turns out that in LLVM3.1, constant vectors of ints/floats are represented with llvM::ConstantDataVector, but constant vectors of bools use llvm::ConstantVector (which is what LLVM 3.0 uses for all constant vectors). Now lGetMask() always does the llvm::ConstantVector path, to cover this case. This improves generated C++ code by eliminating things like select with an all on/off mask, turning movmask calls with constants into constant values, etc.	2012-04-18 11:39:11 -07:00
Matt Pharr	c202e9e106	Add debugging printing code to optimization passes. Now all of the passes dump out the basic block before and after they do their thing when --debug is enabled.	2012-04-18 11:39:10 -07:00
Matt Pharr	645a8c9349	Fix serious bug in VSelMovmskOpt When the mask was all off, we'd choose the incorrect operand! (This bug was masked since this optimization wasn't triggering as intended, due to other issues to be fixed in a forthcoming commit.	2012-04-18 11:39:10 -07:00
Matt Pharr	b9d6ba2aa0	Always set target info, even when compiling to generic targets. This allows the SROA pass eliminate a lot of allocas and loads and stores, which helps a lot for performance.	2012-04-17 15:10:30 -07:00
Matt Pharr	fefa86e0cf	Remove LLVM_TYPE_CONST #define / usage. Now with LLVM 3.0 and beyond, types aren't const.	2012-04-15 20:11:27 -07:00
Matt Pharr	098c4910de	Remove support for building with LLVM 2.9. A forthcoming change uses some features of LLVM 3.0's new type system, and it's not worth back-porting this to also all work with LLVM 2.9.	2012-04-15 20:08:51 -07:00
Matt Pharr	4f8cf019ca	Add pass to verify module before starting optimizations.	2012-04-05 08:49:39 -07:00
Matt Pharr	4c9ac7fcf1	Fix build with LLVM 2.9.	2012-04-05 08:22:40 -07:00
Jean-Luc Duprat	e9626a1d10	Added macro PRId64 to opt.cpp for compilation on Windows	2012-03-30 16:56:30 -07:00
Matt Pharr	7e954e4248	Don't issue gather/scatter warnigns in the 'extra' bits of foreach loops. With AOS data, we can often coalesce the accesses into gathers for the main part of foreach loops but only fail on the last bits where the mask is not all on (since the coalescing code doesn't handle mixed masks, yet.) Before, we'd report success with coalescing and then also report that gathers were needed for the same accesses that were coalesced, which was a) confusing, and b) didn't accurately represent what was going on for the majority of the loop iterations.	2012-03-19 15:08:35 -07:00
Matt Pharr	57af0eb64f	Still do the gather/scatter -> load store pass even if leaving 'pseudo' mem opts unchanged.	2012-03-19 12:04:38 -07:00
Matt Pharr	60aae16752	Move check for linear vector to LLVMVectorIsLinear() function.	2012-03-19 11:57:04 -07:00
Matt Pharr	e264d95019	LLVMVectorValuesAllEqual() improvements. Clean up the API, so the caller doesn't have to pass in a vector so the function can track PHI nodes (do that internally instead.) Handle casts in lValuesAreEqual().	2012-03-19 11:54:18 -07:00
Matt Pharr	0664f5a724	Add LLVMExtractVectorInts() function, use it in the opt code.	2012-03-19 11:48:38 -07:00
Matt Pharr	17c6a19527	Add LLVMExtractFirstVectorElement() function (and use it). For cases where it turns out that we just need the first element of a vector (e.g. because we've determined that all of the values are equal), it's often more efficient to only compute that one value with scalar operations than to compute the whole vector's worth and then just use one value. This function tries to rewrite a vector computation to the scalar equivalent, if possible. (Partial work-around to http://llvm.org/bugs/show_bug.cgi?id=11775.) Note that sometimes this is the wrong thing to do--if we need the entire vector value for other purposes, for example.	2012-03-19 11:48:33 -07:00
Matt Pharr	cbc8b8259b	Use LLVMIntAsType() in opt code instead of locally-defined equivalent.	2012-03-19 11:36:00 -07:00
Matt Pharr	1067a2e4be	Add LLVMShuffleVectors() and LLVMConcatVectors() functions. These were local functions in opt.cpp that are now public via the llvmutil.* files.	2012-03-19 11:34:52 -07:00
Matt Pharr	74a031a759	Small improvements to debug info printing in opt.cpp	2012-03-19 11:32:08 -07:00
Matt Pharr	9ec8e5a275	Fix compile warnings on Linux	2012-03-12 13:12:23 -07:00
Matt Pharr	8fdf84de04	Disable debugging printing code.	2012-03-05 09:58:09 -08:00
Matt Pharr	e013e0a374	Handle extract instructions in the lGetBasePtrAndOffsets() pattern matching code.	2012-03-05 09:58:09 -08:00
Matt Pharr	f7937f1e4b	Fix build with LLVM2.9/3.0	2012-03-03 10:30:56 -08:00
Matt Pharr	95224f3f11	Improve detection of cases where 32-bit gather/scatter can be used. Previously, we weren't noticing that an <n x i64> zero vector could be represented as an <n x i32> without error.	2012-02-21 12:13:25 -08:00
Matt Pharr	a86b942730	Fix cases in coalesce opt where offsets would be truncated to 32 bits	2012-02-14 10:05:07 -08:00
Matt Pharr	cc86e4a7d2	Disable coalescing optimizations when using generic target. The main issue is that they end up generating a number of smaller vector ops (e.g. 4-wide and 8-wide on the 16-wide generic target, which the examples/intrinsics implementations don't currently support. This fixes a number of failing tests for now; it may be worth generalizing the stuff in examples/intrinsics at some point, since as a general principle, e.g. if generating LLVM IR output, the coalescing optimizations are still desirable. Issue #175.	2012-02-13 16:52:01 -08:00

1 2 3 4

179 Commits