aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	12706cd37f	Debugging optimization pass updates Don't run mem2reg with -O0 anymore, but do run the intrinsics opt pass, which allows some CFG simplification due to the mask being all on, etc.	2012-04-25 08:43:11 -10:00
Matt Pharr	32815e628d	Improve naming of llvm Instructions created. We now try harder to keep the names of instructions related to the initial names of variables they're derived from and so forth. This is useful for making both LLVM IR as well as generated C++ code easier to correlate back to the original ispc source code. Issue #244.	2012-04-19 16:36:46 -07:00
Matt Pharr	326c45fa17	Fix bugs in LLVMExtractFirstVectorElement(). When we're manually scalarizing the extraction of the first element of a vector value, we need to be careful about handling constant values and about where new instructions are inserted. The old code was sloppy about this, which in turn lead to invalid IR in some cases. For example, the two bugs below were essentially due to generating an extractelement inst from a zeroinitializer value and then inserting it in the wrong bblock such that a phi node that used that value was malformed. Fixes issues #240 and #229.	2012-04-19 09:45:04 -07:00
Matt Pharr	a2bb899a6b	Opt debug printing improvement Now, just match the prefix of the provided function name of interest, which allows us to not worry about managing details.	2012-04-19 09:34:54 -07:00
Matt Pharr	9fedb1674e	Improve basic block dumping from optimization passes. Now done via a macro, which is cleaner. It's also now possible to specify a single function to watch, which is useful for debugging.	2012-04-18 15:46:18 -07:00
Matt Pharr	7c91b01125	Handle more forms of constant vectors in lGetMask(). Various optimization passes depend on turning a compile-time constant mask into a bit vector; it turns out that in LLVM3.1, constant vectors of ints/floats are represented with llvM::ConstantDataVector, but constant vectors of bools use llvm::ConstantVector (which is what LLVM 3.0 uses for all constant vectors). Now lGetMask() always does the llvm::ConstantVector path, to cover this case. This improves generated C++ code by eliminating things like select with an all on/off mask, turning movmask calls with constants into constant values, etc.	2012-04-18 11:39:11 -07:00
Matt Pharr	c202e9e106	Add debugging printing code to optimization passes. Now all of the passes dump out the basic block before and after they do their thing when --debug is enabled.	2012-04-18 11:39:10 -07:00
Matt Pharr	645a8c9349	Fix serious bug in VSelMovmskOpt When the mask was all off, we'd choose the incorrect operand! (This bug was masked since this optimization wasn't triggering as intended, due to other issues to be fixed in a forthcoming commit.	2012-04-18 11:39:10 -07:00
Matt Pharr	b9d6ba2aa0	Always set target info, even when compiling to generic targets. This allows the SROA pass eliminate a lot of allocas and loads and stores, which helps a lot for performance.	2012-04-17 15:10:30 -07:00
Matt Pharr	fefa86e0cf	Remove LLVM_TYPE_CONST #define / usage. Now with LLVM 3.0 and beyond, types aren't const.	2012-04-15 20:11:27 -07:00
Matt Pharr	098c4910de	Remove support for building with LLVM 2.9. A forthcoming change uses some features of LLVM 3.0's new type system, and it's not worth back-porting this to also all work with LLVM 2.9.	2012-04-15 20:08:51 -07:00
Matt Pharr	4f8cf019ca	Add pass to verify module before starting optimizations.	2012-04-05 08:49:39 -07:00
Matt Pharr	4c9ac7fcf1	Fix build with LLVM 2.9.	2012-04-05 08:22:40 -07:00
Jean-Luc Duprat	e9626a1d10	Added macro PRId64 to opt.cpp for compilation on Windows	2012-03-30 16:56:30 -07:00
Matt Pharr	7e954e4248	Don't issue gather/scatter warnigns in the 'extra' bits of foreach loops. With AOS data, we can often coalesce the accesses into gathers for the main part of foreach loops but only fail on the last bits where the mask is not all on (since the coalescing code doesn't handle mixed masks, yet.) Before, we'd report success with coalescing and then also report that gathers were needed for the same accesses that were coalesced, which was a) confusing, and b) didn't accurately represent what was going on for the majority of the loop iterations.	2012-03-19 15:08:35 -07:00
Matt Pharr	57af0eb64f	Still do the gather/scatter -> load store pass even if leaving 'pseudo' mem opts unchanged.	2012-03-19 12:04:38 -07:00
Matt Pharr	60aae16752	Move check for linear vector to LLVMVectorIsLinear() function.	2012-03-19 11:57:04 -07:00
Matt Pharr	e264d95019	LLVMVectorValuesAllEqual() improvements. Clean up the API, so the caller doesn't have to pass in a vector so the function can track PHI nodes (do that internally instead.) Handle casts in lValuesAreEqual().	2012-03-19 11:54:18 -07:00
Matt Pharr	0664f5a724	Add LLVMExtractVectorInts() function, use it in the opt code.	2012-03-19 11:48:38 -07:00
Matt Pharr	17c6a19527	Add LLVMExtractFirstVectorElement() function (and use it). For cases where it turns out that we just need the first element of a vector (e.g. because we've determined that all of the values are equal), it's often more efficient to only compute that one value with scalar operations than to compute the whole vector's worth and then just use one value. This function tries to rewrite a vector computation to the scalar equivalent, if possible. (Partial work-around to http://llvm.org/bugs/show_bug.cgi?id=11775.) Note that sometimes this is the wrong thing to do--if we need the entire vector value for other purposes, for example.	2012-03-19 11:48:33 -07:00
Matt Pharr	cbc8b8259b	Use LLVMIntAsType() in opt code instead of locally-defined equivalent.	2012-03-19 11:36:00 -07:00
Matt Pharr	1067a2e4be	Add LLVMShuffleVectors() and LLVMConcatVectors() functions. These were local functions in opt.cpp that are now public via the llvmutil.* files.	2012-03-19 11:34:52 -07:00
Matt Pharr	74a031a759	Small improvements to debug info printing in opt.cpp	2012-03-19 11:32:08 -07:00
Matt Pharr	9ec8e5a275	Fix compile warnings on Linux	2012-03-12 13:12:23 -07:00
Matt Pharr	8fdf84de04	Disable debugging printing code.	2012-03-05 09:58:09 -08:00
Matt Pharr	e013e0a374	Handle extract instructions in the lGetBasePtrAndOffsets() pattern matching code.	2012-03-05 09:58:09 -08:00
Matt Pharr	f7937f1e4b	Fix build with LLVM2.9/3.0	2012-03-03 10:30:56 -08:00
Matt Pharr	95224f3f11	Improve detection of cases where 32-bit gather/scatter can be used. Previously, we weren't noticing that an <n x i64> zero vector could be represented as an <n x i32> without error.	2012-02-21 12:13:25 -08:00
Matt Pharr	a86b942730	Fix cases in coalesce opt where offsets would be truncated to 32 bits	2012-02-14 10:05:07 -08:00
Matt Pharr	cc86e4a7d2	Disable coalescing optimizations when using generic target. The main issue is that they end up generating a number of smaller vector ops (e.g. 4-wide and 8-wide on the 16-wide generic target, which the examples/intrinsics implementations don't currently support. This fixes a number of failing tests for now; it may be worth generalizing the stuff in examples/intrinsics at some point, since as a general principle, e.g. if generating LLVM IR output, the coalescing optimizations are still desirable. Issue #175.	2012-02-13 16:52:01 -08:00
Matt Pharr	e864447e4a	Fix silly bug in vector scale extraction optimization. (Introduced in `f20a2d2ee`. How did this ever pass tests?)	2012-02-13 12:06:45 -08:00
Matt Pharr	73bf552cd6	Add support for coalescing memory accesses from gathers. There are two related optimizations that happen now. (These currently only apply for gathers where the mask is known to be all on, and to gathers that are accessing 32-bit sized elements, but both of these may be generalized in the future.) First, for any single gather, we are now more flexible in mapping it to individual memory operations. Previously, we would only either map it to a general gather (one scalar load per SIMD lane), or an unaligned vector load (if the program instances could be determined to be accessing a sequential set of locations in memory.) Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit), 4-wide, or 8-wide loads. Further, we now generate code that shuffles these loads around. Doing fewer, larger loads in this manner, when possible, can be more efficient. Second, we can coalesce memory accesses across multiple gathers. If we have a series of gathers without any memory writes in the middle, then we try to analyze their reads collectively and choose an efficient set of loads for them. Not only does this help if different gathers reuse values from the same location in memory, but it's specifically helpful when data with AOS layout is being accessed; in this case, we're often able to generate wide vector loads and appropriate shuffles automatically.	2012-02-10 13:10:39 -08:00
Matt Pharr	f20a2d2ee9	Generalize code to extract scales by 2/4/8 from addressing calculations. Now, if we have a scale by 16, say, we extract out the scalar scale of 8 and leave an explicit scale by 2.	2012-02-10 12:35:44 -08:00
Matt Pharr	0c25bc063c	Add lGEPInst() utility routine to opt.cpp. Deal with the messiness of LLVM API changes when creating these in a single place.	2012-02-10 12:32:15 -08:00
Matt Pharr	5b4673e8eb	Fix build with LLVM 2.9.	2012-02-07 08:37:13 -08:00
Matt Pharr	0432f97555	Fix build with LLVM 3.1 TOT	2012-01-31 14:10:07 -08:00
Matt Pharr	f73abb05a7	Fix bug in handling scatters where all instances go to the same location. Previously, we'd pick one lane and generate a regular store for its value. This was the wrong thing to do, since we also should have been checking that the mask was on (for the lane that was chosen). This bug didn't become evident until the scalar target was added, since many stores fall into this case with that target. Now, we just leave those as regular scatters. Fixes most of the failing tests for the scalar target listed in issue #167.	2012-01-31 11:06:14 -08:00
Matt Pharr	d71c49494f	Missed pass that should be skipped when pseudo memory ops are supposed to be left unchanged.	2012-01-31 11:02:23 -08:00
Matt Pharr	1eec27f890	Scalar target fixes. Don't issue warnings about all instances writing to the same location if there is only one program instance in the gang. Be sure to report that all values are equal in one-element vectors in LLVMVectorValuesAllEqual(). Issue #166.	2012-01-31 08:52:11 -08:00
Matt Pharr	b7f17d435f	Fix crash in gather/scatter optimization pass.	2012-01-27 14:44:35 -08:00
Matt Pharr	5893a9c49d	Remove incorrect assert	2012-01-27 09:14:45 -08:00
Matt Pharr	177e6312b4	Fix build with LLVM ToT (ConstantVector::getVectorElements() is gone now).	2012-01-27 09:07:58 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	7be2c399b1	Rename various optimization passes to have more descriptive names. No functionality change.	2012-01-23 14:49:48 -08:00
Matt Pharr	d6337b3b22	Code cleanups in opt.cpp; no functional change	2012-01-23 14:36:32 -08:00
Matt Pharr	91ac3b9d7c	Back out WIP changes to opt.cpp that were inadvertently checked in.	2012-01-21 07:34:53 -08:00
Matt Pharr	d65bf2eb2f	Doxygen number bump and release notes for 1.1.3	2012-01-20 17:04:16 -08:00
Matt Pharr	4388338dad	Fix performance regression introduced in `be0c77d556` Effectively, the patterns that detected when given a gather or scatter in base+offsets form, the offsets were actually a multiple of 2/4/8, were no longer working. This change not only fixes this, but also expands the set of patterns that are matched by this. For example, given offsets of the form 4v1 + 16v2, it identifies a scale of 4 and new offsets of v1 + 4*v2. This fix makes the volume renderer run 1.19x faster, and noise 1.54x faster.	2012-01-19 17:57:59 -08:00
Matt Pharr	68f6ea8def	For << and >> with C++, detect when all instances are shifting by the same amount. In this case, we now emit calls to potentially-specialized functions for the left/right shifts that take a single integer value for the shift amount. These in turn can be matched to the corresponding intrinsics for the SSE target. Issue #145.	2012-01-19 10:04:32 -07:00
Matt Pharr	3bf3ac7922	Be more conservative about using blending in place of masked store. More specifically, we do a proper masked store (rather than a load- blend-store) unless we can determine that we're accessing a stack-allocated "varying" variable. This fixes a number of nefarious bugs where given code like: uniform float a[21]; foreach (i = 0 … 21) a[i] = 0; We'd use a blend and in turn read past the end of a[] in the last iteration. Also made slight changes to inlining in aobench; this keeps compiles to ~5s, versus ~45s without them (with this change). Fixes issue #160.	2012-01-17 23:42:22 -07:00

1 2 3 4 5

209 Commits