aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	71317e6aa6	Fix bug in gather/scatter optimization passes. When flattening chains of insertelement instructions, we didn't handle the case where the initial insertelement was to a constant vector (with one value set and the other values undef). Also generalized the "do all of the instances access the same location" check to handle the case where some of them are accessing undef locations; these are ignored in this check, as they should correspond to the mask being off for that lane anyway. Fixes issue #149.	2012-01-06 09:19:18 -08:00
Matt Pharr	8938e14442	Add support for emitting ~generic vectorized C++ code. The compiler now supports an --emit-c++ option, which generates generic vector C++ code. To actually compile this code, the user must provide C++ code that implements a variety of types and operations (e.g. adding two floating-point vector values together, comparing them, etc). There are two examples of this required code in examples/intrinsics: generic-16.h is a "generic" 16-wide implementation that does all required with scalar math; it's useful for demonstrating the requirements of the implementation. Then, sse4.h shows a simple implementation of a SSE4 target that maps the emitted function calls to SSE intrinsics. When using these example implementations with the ispc test suite, all but one or two tests pass with gcc and clang on Linux and OSX. There are currently ~10 failures with icc on Linux, and ~50 failures with MSVC 2010. (To be fixed in coming days.) Performance varies: when running the examples through the sse4.h target, some have the same performance as when compiled with --target=sse4 from ispc directly (options), while noise is 12% slower, rt is 26% slower, and aobench is 2.2x slower. The details of this haven't yet been carefully investigated, but will be in coming days as well. Issue #92.	2012-01-04 12:59:03 -08:00
Matt Pharr	dea13979e0	Fix bug in lIs248Splat() in opt.cpp	2012-01-04 11:55:02 -08:00
Matt Pharr	052d34bf5b	Various cleanups to optimization code. Stop using the PassManagerBuilder but add all of the passes directly in code here. This currently leads to no different behavior, but was useful with experimenting with disabling the SROA pass when compiling to generic targets.	2012-01-04 11:54:44 -08:00
Matt Pharr	d4c5e82896	Add VSelMovMsk optimization pass. Various peephole improvements to vector select instructions.	2012-01-04 11:52:27 -08:00
Matt Pharr	562d61caff	Added masked load optimization pass. This pass handles the "all on" and "all off" mask cases appropriately. Also renamed load_masked stuff in built-ins to masked_load for consistency with masked_store.	2012-01-04 11:51:26 -08:00
Matt Pharr	1d9201fe3d	Add "generic" 4, 8, and 16-wide targets. When used, these targets end up with calls to undefined functions for all of the various special vector stuff ispc needs to compile ispc programs (masked store, gather, min/max, sqrt, etc.). These targets are not yet useful for anything, but are a step toward having an option to C++ code with calls out to intrinsics. Reorganized the directory structure a bit and put the LLVM bitcode used to define target-specific stuff (as well as some generic built-ins stuff) into a builtins/ directory. Note that for building on Windows, it's now necessary to set a LLVM_VERSION environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)	2011-12-19 13:46:50 -08:00
Matt Pharr	6dbb15027a	Take advantage of x86's free "scale by 2, 4, or 8" in addressing calculations When loading from an address that's computed by adding two registers together, x86 can scale one of them by 2, 4, or 8, for free as part of the addressing calculation. This change makes the code generated for gather and scatter use this. For the cases where gather/scatter is based on a base pointer and an integer offset vector, the GSImprovementsPass looks to see if the integer offsets are being computed as 2/4/8 times some other value. If so, it extracts the 2x/4x/8x part and leaves the rest there as the the offsets. The {gather,scatter}_base_offsets_* functions take an i32 scale factor, which is passed to them, and then they carefully generate IR so that it hits LLVM's pattern matching for these scales. This is particular win on AVX, since it saves us two 4-wide integer multiplies. Noise runs 14% faster with this. Issue #132.	2011-12-16 15:55:44 -08:00
Matt Pharr	e82a720223	Fix various warnings / build issues on Windows	2011-12-15 12:06:38 -08:00
Matt Pharr	8d1b77b235	Have assertion macro and FATAL() text ask user to file a bug, provide URL to do so. Switch to Assert() from assert() to make it clear it's not the C stdlib one we're using any more.	2011-12-15 11:11:16 -08:00
Matt Pharr	46bfef3fce	Add option to turn off codegen improvements when mask 'all on' is statically known.	2011-12-11 16:16:36 -08:00
Matt Pharr	e2b6ed3db8	Fix built for LLVM2.9 and 3.1svn	2011-12-06 08:08:41 -08:00
Matt Pharr	b48775a549	Handle global arrays better in varying pointer analysis. Specifically, indexing into global arrays sometimes comes in as a big llvm::ConstantVector, so we need to handle traversing those as well when we do the corresponding checks in GatherScatterFlattenOpt so that we still detect cases where we can convert them into the base pointer + offsets form that's used in later analysis.	2011-11-30 12:29:49 -08:00
Matt Pharr	e52104ff55	Pointer fixes/improvements. Allow <, <=, >, >= comparisons of pointers Allow explicit type-casting of pointers to and from integers Fix bug in handling expressions of the form "int + ptr" ("ptr + int" was fine). Fix a bug in TypeCastExpr where varying -> uniform typecasts would be allowed (leading to a crash later)	2011-11-29 13:22:36 -08:00
Matt Pharr	2a6e3e5fea	Fix bug in ptr+offset decomposition in GatherScatterFlattenOpt Given IR that encoded computation like "vec(4) + ptr2int(some pointer)", we'd report that "int2ptr(4)" was the base pointer and the ptr2int value was the offset. This in turn could lead to incorrect code from LLVM, since we'd end up with GEP instructions where the first operand was int2ptr(4) and the offset was the original pointer value. This in turn was sometimes leading to incorrect code and thence a failure on the tests/gs-double-improve-multidim.ispc test since LLVM's memory read/write analysis assumes that nothing after the first operand of a GEP is actually a pointer.	2011-11-28 15:00:41 -08:00
Matt Pharr	975db80ef6	Add support for pointers to the language. Pointers can be either uniform or varying, and behave correspondingly. e.g.: "uniform float * varying" is a varying pointer to uniform float data in memory, and "float * uniform" is a uniform pointer to varying data in memory. Like other types, pointers are varying by default. Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic, and the array/pointer duality all bahave as in C. Array arguments to functions are converted to pointers, also like C. There is a built-in NULL for a null pointer value; conversion from compile-time constant 0 values to NULL still needs to be implemented. Other changes: - Syntax for references has been updated to be C++ style; a useful warning is now issued if the "reference" keyword is used. - It is now illegal to pass a varying lvalue as a reference parameter to a function; references are essentially uniform pointers. This case had previously been handled via special case call by value return code. That path has been removed, now that varying pointers are available to handle this use case (and much more). - Some stdlib routines have been updated to take pointers as arguments where appropriate (e.g. prefetch and the atomics). A number of others still need attention. - All of the examples have been updated - Many new tests TODO: documentation	2011-11-27 13:09:59 -08:00
Matt Pharr	068ea3e4c4	Better SourcePos reporting for gathers/scatters	2011-11-21 10:26:53 -08:00
Matt Pharr	f8eb100c60	Use llvm TargetData to find object sizes, offsets. Previously, to compute the size of objects and the offsets of struct elements within structs, we were using the trick of using getelementpointer with a NULL base pointer and then casting the result to an int32/64. However, since we actually know the target we're compiling for at compile time, we can use corresponding methods from TargetData to get these values directly. This mostly cleans up code, but may make some of the gather/scatter lowering to loads/stores optimizations work better in the presence of structures.	2011-11-06 19:31:19 -08:00
Matt Pharr	cabe358c0a	Workaround change to linker behavior in LLVM 3.1 Now, the Linker::LinkModules() call doesn't link in any functions marked as 'internal', which is problematic, since we'd like to have just about all of the builtins marked as internal so that they are eliminated after they've been inlined when they are in fact used. This change removes all of the internal qualifiers in the builtins and adds a lSetInternalFunctions() routine to builtins.cpp that sets this property on the functions that need it after they've been linked in by LinkModules().	2011-11-05 16:57:26 -07:00
Matt Pharr	43a2d510bf	Incorporate per-lane offsets for varying data in the front-end. Previously, it was only in the GatherScatterFlattenOpt optimization pass that we added the per-lane offsets when we were indexing into varying data. (Specifically, the case of float foo[]; int index; foo[index], where foo is an array of varying elements rather than uniform elements.) Now, this is done in the front-end as we're first emitting code. In addition to the basic ugliness of doing this in an optimization pass, it was also error-prone to do it there, since we no longer have access to all of the type information that's around in the front-end. No functionality or performance change.	2011-11-03 13:15:07 -07:00
Matt Pharr	6084d6aeaf	Added disable-handle-pseudo-memory-ops option.	2011-10-31 08:29:13 -07:00
Matt Pharr	d224252b5d	Fix bug where multiplying varying array offset by zero would cause crash in optimization passes.	2011-10-31 08:28:51 -07:00
Matt Pharr	8b719e4c4e	Fix warnings reported by doxygen	2011-10-20 11:49:54 -07:00
Matt Pharr	074cbc2716	Fix #ifdefs to catch LLVM 3.1svn now as well	2011-10-19 14:01:19 -07:00
Matt Pharr	19087e4761	When casting pointers to ints, choose int32/64 based on target pointer size. Issue #97.	2011-10-17 06:57:04 -04:00
Matt Pharr	c21e704a5c	Fix LLVM 2.9 build. Issue #114	2011-10-15 06:48:20 -07:00
Matt Pharr	9f2aa8d92a	Handle ConstantExpressions when computing address+offset vectors for scatter/gather. In particular, this fixes issue #81, where a global variable access was leading to ConstantExpressions showing up in this code, which it wasn't previously expecting.	2011-10-14 11:20:08 -07:00
Matt Pharr	2460fa5c83	Improve gather/scatter optimization passes to handle loops better. Specifically, now we can work through phi nodes in the IR to detect cases where an index value is actually the same across lanes or is linear across the lanes. For example, this is a loop that used to require gathers but is now turned into vector loads: for (int i = programIndex; i < 16; i += programCount) sum += a[i]; Fixes issue #107.	2011-10-13 17:01:25 -07:00
Matt Pharr	1198520029	Improve gather->vector load optimization to detect <linear sequence>-<uniform> case. Previously, we didn't handle subtraction ops when deciphering offsets in order to try to change gathers t evictor loads.	2011-10-11 13:24:40 -07:00
Matt Pharr	ec5e627e56	Mark internal stdlib functions as "internal" linkage, not "private". This fixes print() statements on OSX. (http://llvm.org/bugs/show_bug.cgi?id=11080)	2011-10-06 13:32:20 -07:00
Matt Pharr	ff2a43ac19	Run the CFG simplification pass even when optimization is disabled. This fixes an issue with undefined SVML symbols with code that called transcendental functions in the stdandard library, even when the SVML math library hadn't been selected.	2011-10-06 09:20:50 -07:00
Matt Pharr	2df9da2524	Be careful to not inadvertently match NULL functions in optimization passes.	2011-10-01 08:34:11 -07:00
Matt Pharr	6d39d5fc3e	Small cleanups. Add __num_cores() to the list of symbols to remove from the module at the end. Fix declarations of mask type for 64-bit atomics to silence warnings.	2011-09-28 16:26:35 -07:00
Matt Pharr	2405dae8e6	Use malloc() to get space for task arguments when compiling to AVX. This is to work around the LLVM bug/limitation discused in LLVM bug 10841 (http://llvm.org/bugs/show_bug.cgi?id=10841).	2011-09-17 13:38:51 -07:00
Matt Pharr	3607f3e045	Remove support for building with LLVM 2.8. Fixes issue #66 . Both 2.9 and top-of-tree generate substantially better code than LLVM 2.8 did, so it's not worth fixing the 2.8 build.	2011-09-17 13:18:59 -07:00
Matt Pharr	e2a88d491f	Mark the internal __fast_masked_vload function as static	2011-09-13 15:43:48 -07:00
Matt Pharr	30f9dcd4f5	Unroll loops by default, add --opt=disable-loop-unroll to disable. Issue #78.	2011-09-13 15:37:18 -07:00
Matt Pharr	dd153d3c5c	Handle more instruction types when flattening offset vectors. Generalize the lScalarizeVector() utility routine (used in determining when we can change gathers/scatters into vector loads/stores, respectively) to handle vector shuffles and vector loads. This fixes issue #79, which provided a case where a gather was being performed even though a vector load was possible.	2011-09-13 09:43:56 -07:00
Matt Pharr	6375ed9224	AVX: Fix bug with misdeclaration of blend intrinsic. This was preventing the "convert an all-on blend to one of the operand values" optimization from kicking on in AVX.	2011-09-12 06:42:38 -07:00
Matt Pharr	785d8a29d3	Run mem2reg pass even when doing -O0 compiles	2011-09-09 09:24:43 -07:00
Matt Pharr	c86128e8ee	AVX: go back to using blend (vs. masked store) when possible. All of the masked store calls were inhibiting putting values into registers, which in turn led to a lot of unnecessary stack traffic. This approach seems to give better code in the end.	2011-09-07 11:26:49 -07:00
Matt Pharr	eb7913f1dd	AVX: fix alignment when changing masked load to regular load. Also added some debugging/tracing stuff (commented out). Commented out iffy assert that was hitting for avx stuff.	2011-09-01 15:45:49 -07:00
Matt Pharr	f65a20c700	AVX bugfix: when replacing 'all on' masked store with a store, the rvalue is operand 2, not operand 1 (which is the mask!)	2011-08-31 18:06:29 -07:00
Matt Pharr	ad9e66650d	AVX bugfix with alignment for store instructions. When replacing 'all on' masked store with regular store, set alignment to be the vector element alignment, not the alignment for a whole vector. (i.e. 4 or 8 byte alignment, not 32 or 64).	2011-08-29 16:58:48 -07:00
Matt Pharr	4ab982bc16	Various AVX fixes (found by inspection). Emit calls to masked_store, not masked_store_blend, when handling masked stores emitted by the frontend. Fix bug in binary8to16 macro in builtins.m4 Fix bug in 16-wide version of __reduce_add_float Remove blend function implementations for masked_store_blend for AVX; just forward those on to the corresponding real masked store functions.	2011-08-26 12:58:02 -07:00
Matt Pharr	fe54f1ad8e	Fixes to build with latest LLVM ToT	2011-08-18 08:34:49 +01:00
Matt Pharr	922dbdec06	Fixes to build with LLVM top-of-tree	2011-07-26 10:57:49 +01:00
Matt Pharr	16be1d313e	AVX updates / improvements. Add optimization patterns to detect and simplify masked loads and stores with the mask all on / all off. Enable AVX for LLVM 3.0 builds (still generally hits bugs / unimplemented stuff on the LLVM side, but it's getting there).	2011-07-25 07:41:37 +01:00
Matt Pharr	96d40327d0	Fix issue #72 : 64 gathers/scatters led to undefined symbols	2011-07-21 14:44:55 +01:00
Matt Pharr	bba7211654	Add support for int8/int16 types. Addresses issues #9 and #42 .	2011-07-21 06:57:40 +01:00

1 2

55 Commits