aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	e864447e4a	Fix silly bug in vector scale extraction optimization. (Introduced in `f20a2d2ee`. How did this ever pass tests?)	2012-02-13 12:06:45 -08:00
Matt Pharr	73bf552cd6	Add support for coalescing memory accesses from gathers. There are two related optimizations that happen now. (These currently only apply for gathers where the mask is known to be all on, and to gathers that are accessing 32-bit sized elements, but both of these may be generalized in the future.) First, for any single gather, we are now more flexible in mapping it to individual memory operations. Previously, we would only either map it to a general gather (one scalar load per SIMD lane), or an unaligned vector load (if the program instances could be determined to be accessing a sequential set of locations in memory.) Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit), 4-wide, or 8-wide loads. Further, we now generate code that shuffles these loads around. Doing fewer, larger loads in this manner, when possible, can be more efficient. Second, we can coalesce memory accesses across multiple gathers. If we have a series of gathers without any memory writes in the middle, then we try to analyze their reads collectively and choose an efficient set of loads for them. Not only does this help if different gathers reuse values from the same location in memory, but it's specifically helpful when data with AOS layout is being accessed; in this case, we're often able to generate wide vector loads and appropriate shuffles automatically.	2012-02-10 13:10:39 -08:00
Matt Pharr	f20a2d2ee9	Generalize code to extract scales by 2/4/8 from addressing calculations. Now, if we have a scale by 16, say, we extract out the scalar scale of 8 and leave an explicit scale by 2.	2012-02-10 12:35:44 -08:00
Matt Pharr	0c25bc063c	Add lGEPInst() utility routine to opt.cpp. Deal with the messiness of LLVM API changes when creating these in a single place.	2012-02-10 12:32:15 -08:00
Matt Pharr	5b4673e8eb	Fix build with LLVM 2.9.	2012-02-07 08:37:13 -08:00
Matt Pharr	0432f97555	Fix build with LLVM 3.1 TOT	2012-01-31 14:10:07 -08:00
Matt Pharr	f73abb05a7	Fix bug in handling scatters where all instances go to the same location. Previously, we'd pick one lane and generate a regular store for its value. This was the wrong thing to do, since we also should have been checking that the mask was on (for the lane that was chosen). This bug didn't become evident until the scalar target was added, since many stores fall into this case with that target. Now, we just leave those as regular scatters. Fixes most of the failing tests for the scalar target listed in issue #167.	2012-01-31 11:06:14 -08:00
Matt Pharr	d71c49494f	Missed pass that should be skipped when pseudo memory ops are supposed to be left unchanged.	2012-01-31 11:02:23 -08:00
Matt Pharr	1eec27f890	Scalar target fixes. Don't issue warnings about all instances writing to the same location if there is only one program instance in the gang. Be sure to report that all values are equal in one-element vectors in LLVMVectorValuesAllEqual(). Issue #166.	2012-01-31 08:52:11 -08:00
Matt Pharr	b7f17d435f	Fix crash in gather/scatter optimization pass.	2012-01-27 14:44:35 -08:00
Matt Pharr	5893a9c49d	Remove incorrect assert	2012-01-27 09:14:45 -08:00
Matt Pharr	177e6312b4	Fix build with LLVM ToT (ConstantVector::getVectorElements() is gone now).	2012-01-27 09:07:58 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	7be2c399b1	Rename various optimization passes to have more descriptive names. No functionality change.	2012-01-23 14:49:48 -08:00
Matt Pharr	d6337b3b22	Code cleanups in opt.cpp; no functional change	2012-01-23 14:36:32 -08:00
Matt Pharr	91ac3b9d7c	Back out WIP changes to opt.cpp that were inadvertently checked in.	2012-01-21 07:34:53 -08:00
Matt Pharr	d65bf2eb2f	Doxygen number bump and release notes for 1.1.3	2012-01-20 17:04:16 -08:00
Matt Pharr	4388338dad	Fix performance regression introduced in `be0c77d556` Effectively, the patterns that detected when given a gather or scatter in base+offsets form, the offsets were actually a multiple of 2/4/8, were no longer working. This change not only fixes this, but also expands the set of patterns that are matched by this. For example, given offsets of the form 4v1 + 16v2, it identifies a scale of 4 and new offsets of v1 + 4*v2. This fix makes the volume renderer run 1.19x faster, and noise 1.54x faster.	2012-01-19 17:57:59 -08:00
Matt Pharr	68f6ea8def	For << and >> with C++, detect when all instances are shifting by the same amount. In this case, we now emit calls to potentially-specialized functions for the left/right shifts that take a single integer value for the shift amount. These in turn can be matched to the corresponding intrinsics for the SSE target. Issue #145.	2012-01-19 10:04:32 -07:00
Matt Pharr	3bf3ac7922	Be more conservative about using blending in place of masked store. More specifically, we do a proper masked store (rather than a load- blend-store) unless we can determine that we're accessing a stack-allocated "varying" variable. This fixes a number of nefarious bugs where given code like: uniform float a[21]; foreach (i = 0 … 21) a[i] = 0; We'd use a blend and in turn read past the end of a[] in the last iteration. Also made slight changes to inlining in aobench; this keeps compiles to ~5s, versus ~45s without them (with this change). Fixes issue #160.	2012-01-17 23:42:22 -07:00
Matt Pharr	0f8eee9809	Fix cases in optimization code to not inadvertently match calls to func ptrs. If we call a function pointer, CallInst::getCalledFunction() returns NULL; we need to be careful about this case when we're matching various function calls in optimization passes. (Fixes a crash.)	2012-01-12 10:33:06 -08:00
Pierre-Antoine Lacaze	da9200fcee	Fix alloca use on mingw.	2012-01-09 10:19:09 +01:00
Matt Pharr	be0c77d556	Detect more gather/scatter cases that are actually base+offsets. We now recognize patterns like (ptr + offset1 + offset2) as being cases we can handle with the base_offsets variants of the gather/scatter functions. (This can come up with multidimensional array indexing, for example.) Issue #150.	2012-01-08 14:06:44 -08:00
Matt Pharr	ff6971fb15	Use Assert() rather than assert()	2012-01-08 14:06:44 -08:00
Matt Pharr	71317e6aa6	Fix bug in gather/scatter optimization passes. When flattening chains of insertelement instructions, we didn't handle the case where the initial insertelement was to a constant vector (with one value set and the other values undef). Also generalized the "do all of the instances access the same location" check to handle the case where some of them are accessing undef locations; these are ignored in this check, as they should correspond to the mask being off for that lane anyway. Fixes issue #149.	2012-01-06 09:19:18 -08:00
Matt Pharr	8938e14442	Add support for emitting ~generic vectorized C++ code. The compiler now supports an --emit-c++ option, which generates generic vector C++ code. To actually compile this code, the user must provide C++ code that implements a variety of types and operations (e.g. adding two floating-point vector values together, comparing them, etc). There are two examples of this required code in examples/intrinsics: generic-16.h is a "generic" 16-wide implementation that does all required with scalar math; it's useful for demonstrating the requirements of the implementation. Then, sse4.h shows a simple implementation of a SSE4 target that maps the emitted function calls to SSE intrinsics. When using these example implementations with the ispc test suite, all but one or two tests pass with gcc and clang on Linux and OSX. There are currently ~10 failures with icc on Linux, and ~50 failures with MSVC 2010. (To be fixed in coming days.) Performance varies: when running the examples through the sse4.h target, some have the same performance as when compiled with --target=sse4 from ispc directly (options), while noise is 12% slower, rt is 26% slower, and aobench is 2.2x slower. The details of this haven't yet been carefully investigated, but will be in coming days as well. Issue #92.	2012-01-04 12:59:03 -08:00
Matt Pharr	dea13979e0	Fix bug in lIs248Splat() in opt.cpp	2012-01-04 11:55:02 -08:00
Matt Pharr	052d34bf5b	Various cleanups to optimization code. Stop using the PassManagerBuilder but add all of the passes directly in code here. This currently leads to no different behavior, but was useful with experimenting with disabling the SROA pass when compiling to generic targets.	2012-01-04 11:54:44 -08:00
Matt Pharr	d4c5e82896	Add VSelMovMsk optimization pass. Various peephole improvements to vector select instructions.	2012-01-04 11:52:27 -08:00
Matt Pharr	562d61caff	Added masked load optimization pass. This pass handles the "all on" and "all off" mask cases appropriately. Also renamed load_masked stuff in built-ins to masked_load for consistency with masked_store.	2012-01-04 11:51:26 -08:00
Matt Pharr	1d9201fe3d	Add "generic" 4, 8, and 16-wide targets. When used, these targets end up with calls to undefined functions for all of the various special vector stuff ispc needs to compile ispc programs (masked store, gather, min/max, sqrt, etc.). These targets are not yet useful for anything, but are a step toward having an option to C++ code with calls out to intrinsics. Reorganized the directory structure a bit and put the LLVM bitcode used to define target-specific stuff (as well as some generic built-ins stuff) into a builtins/ directory. Note that for building on Windows, it's now necessary to set a LLVM_VERSION environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)	2011-12-19 13:46:50 -08:00
Matt Pharr	6dbb15027a	Take advantage of x86's free "scale by 2, 4, or 8" in addressing calculations When loading from an address that's computed by adding two registers together, x86 can scale one of them by 2, 4, or 8, for free as part of the addressing calculation. This change makes the code generated for gather and scatter use this. For the cases where gather/scatter is based on a base pointer and an integer offset vector, the GSImprovementsPass looks to see if the integer offsets are being computed as 2/4/8 times some other value. If so, it extracts the 2x/4x/8x part and leaves the rest there as the the offsets. The {gather,scatter}_base_offsets_* functions take an i32 scale factor, which is passed to them, and then they carefully generate IR so that it hits LLVM's pattern matching for these scales. This is particular win on AVX, since it saves us two 4-wide integer multiplies. Noise runs 14% faster with this. Issue #132.	2011-12-16 15:55:44 -08:00
Matt Pharr	e82a720223	Fix various warnings / build issues on Windows	2011-12-15 12:06:38 -08:00
Matt Pharr	8d1b77b235	Have assertion macro and FATAL() text ask user to file a bug, provide URL to do so. Switch to Assert() from assert() to make it clear it's not the C stdlib one we're using any more.	2011-12-15 11:11:16 -08:00
Matt Pharr	46bfef3fce	Add option to turn off codegen improvements when mask 'all on' is statically known.	2011-12-11 16:16:36 -08:00
Matt Pharr	e2b6ed3db8	Fix built for LLVM2.9 and 3.1svn	2011-12-06 08:08:41 -08:00
Matt Pharr	b48775a549	Handle global arrays better in varying pointer analysis. Specifically, indexing into global arrays sometimes comes in as a big llvm::ConstantVector, so we need to handle traversing those as well when we do the corresponding checks in GatherScatterFlattenOpt so that we still detect cases where we can convert them into the base pointer + offsets form that's used in later analysis.	2011-11-30 12:29:49 -08:00
Matt Pharr	e52104ff55	Pointer fixes/improvements. Allow <, <=, >, >= comparisons of pointers Allow explicit type-casting of pointers to and from integers Fix bug in handling expressions of the form "int + ptr" ("ptr + int" was fine). Fix a bug in TypeCastExpr where varying -> uniform typecasts would be allowed (leading to a crash later)	2011-11-29 13:22:36 -08:00
Matt Pharr	2a6e3e5fea	Fix bug in ptr+offset decomposition in GatherScatterFlattenOpt Given IR that encoded computation like "vec(4) + ptr2int(some pointer)", we'd report that "int2ptr(4)" was the base pointer and the ptr2int value was the offset. This in turn could lead to incorrect code from LLVM, since we'd end up with GEP instructions where the first operand was int2ptr(4) and the offset was the original pointer value. This in turn was sometimes leading to incorrect code and thence a failure on the tests/gs-double-improve-multidim.ispc test since LLVM's memory read/write analysis assumes that nothing after the first operand of a GEP is actually a pointer.	2011-11-28 15:00:41 -08:00
Matt Pharr	975db80ef6	Add support for pointers to the language. Pointers can be either uniform or varying, and behave correspondingly. e.g.: "uniform float * varying" is a varying pointer to uniform float data in memory, and "float * uniform" is a uniform pointer to varying data in memory. Like other types, pointers are varying by default. Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic, and the array/pointer duality all bahave as in C. Array arguments to functions are converted to pointers, also like C. There is a built-in NULL for a null pointer value; conversion from compile-time constant 0 values to NULL still needs to be implemented. Other changes: - Syntax for references has been updated to be C++ style; a useful warning is now issued if the "reference" keyword is used. - It is now illegal to pass a varying lvalue as a reference parameter to a function; references are essentially uniform pointers. This case had previously been handled via special case call by value return code. That path has been removed, now that varying pointers are available to handle this use case (and much more). - Some stdlib routines have been updated to take pointers as arguments where appropriate (e.g. prefetch and the atomics). A number of others still need attention. - All of the examples have been updated - Many new tests TODO: documentation	2011-11-27 13:09:59 -08:00
Matt Pharr	068ea3e4c4	Better SourcePos reporting for gathers/scatters	2011-11-21 10:26:53 -08:00
Matt Pharr	f8eb100c60	Use llvm TargetData to find object sizes, offsets. Previously, to compute the size of objects and the offsets of struct elements within structs, we were using the trick of using getelementpointer with a NULL base pointer and then casting the result to an int32/64. However, since we actually know the target we're compiling for at compile time, we can use corresponding methods from TargetData to get these values directly. This mostly cleans up code, but may make some of the gather/scatter lowering to loads/stores optimizations work better in the presence of structures.	2011-11-06 19:31:19 -08:00
Matt Pharr	cabe358c0a	Workaround change to linker behavior in LLVM 3.1 Now, the Linker::LinkModules() call doesn't link in any functions marked as 'internal', which is problematic, since we'd like to have just about all of the builtins marked as internal so that they are eliminated after they've been inlined when they are in fact used. This change removes all of the internal qualifiers in the builtins and adds a lSetInternalFunctions() routine to builtins.cpp that sets this property on the functions that need it after they've been linked in by LinkModules().	2011-11-05 16:57:26 -07:00
Matt Pharr	43a2d510bf	Incorporate per-lane offsets for varying data in the front-end. Previously, it was only in the GatherScatterFlattenOpt optimization pass that we added the per-lane offsets when we were indexing into varying data. (Specifically, the case of float foo[]; int index; foo[index], where foo is an array of varying elements rather than uniform elements.) Now, this is done in the front-end as we're first emitting code. In addition to the basic ugliness of doing this in an optimization pass, it was also error-prone to do it there, since we no longer have access to all of the type information that's around in the front-end. No functionality or performance change.	2011-11-03 13:15:07 -07:00
Matt Pharr	6084d6aeaf	Added disable-handle-pseudo-memory-ops option.	2011-10-31 08:29:13 -07:00
Matt Pharr	d224252b5d	Fix bug where multiplying varying array offset by zero would cause crash in optimization passes.	2011-10-31 08:28:51 -07:00
Matt Pharr	8b719e4c4e	Fix warnings reported by doxygen	2011-10-20 11:49:54 -07:00
Matt Pharr	074cbc2716	Fix #ifdefs to catch LLVM 3.1svn now as well	2011-10-19 14:01:19 -07:00
Matt Pharr	19087e4761	When casting pointers to ints, choose int32/64 based on target pointer size. Issue #97.	2011-10-17 06:57:04 -04:00
Matt Pharr	c21e704a5c	Fix LLVM 2.9 build. Issue #114	2011-10-15 06:48:20 -07:00

1 2 3 4

179 Commits