aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	9921b8e530	Predicated 'if' statement performance improvements. Go back to running both sides of 'if' statements with masking and without branching if we can determine that the code is relatively simple (as per the simple cost model), and is safe to run even if the mask is 'all off'. This gives a bit of a performance improvement for some of the examples (most notably, the ray tracer), and is the code that one wants generated in this case anyhow.	2011-09-19 09:54:09 -07:00
Matt Pharr	9052d4b10b	Linux build fixes	2011-09-17 13:42:46 -07:00
Matt Pharr	2405dae8e6	Use malloc() to get space for task arguments when compiling to AVX. This is to work around the LLVM bug/limitation discused in LLVM bug 10841 (http://llvm.org/bugs/show_bug.cgi?id=10841).	2011-09-17 13:38:51 -07:00
Matt Pharr	3607f3e045	Remove support for building with LLVM 2.8. Fixes issue #66 . Both 2.9 and top-of-tree generate substantially better code than LLVM 2.8 did, so it's not worth fixing the 2.8 build.	2011-09-17 13:18:59 -07:00
Matt Pharr	de84acfa5d	On OSX with LLVM 2.9, always generate position-independent code. Fixes Issue #99.	2011-09-17 13:03:51 -07:00
Matt Pharr	a501ab1aa6	Fix parenthesization bugs in cost estimates. Also added the debugging print that helped find these issues. Revert inlining some functions in examples	2011-09-16 19:07:07 -07:00
Matt Pharr	cdc850f98c	Inline some functions in examples	2011-09-16 17:02:21 -07:00
Matt Pharr	ca87579f23	Add a very simple cost model to estimate runtime cost of running code. This is currently only used to decide whether it's worth doing an "are all lanes running" check at the start of functions--for small functions, it's not worth the overhead. The cost is estimated relatively early in compilation (e.g. before we know if an array access is a scatter/gather or not, before constant folding, etc.), so there are many known shortcomings.	2011-09-16 15:09:17 -07:00
Matt Pharr	38fc13d1ab	Remove now unused function.	2011-09-16 14:21:13 -07:00
Matt Pharr	cf9d9f717e	Logic simplification to 'mixed true/false' case for coherent ifs. Use the approach from `173632f446` here as well.	2011-09-16 14:10:55 -07:00
Matt Pharr	173632f446	Generate more efficient for regular varying 'if' statements. For the case where we have a regular (i.e. non-'cif') 'if' statement, the generated code just simply checks to see if any program instance is running before running the corresponding statements. This is a lighter-weight check than IfStmt::emitMaskMixed() was performing.	2011-09-16 12:03:42 -07:00
Matt Pharr	1dedd88132	Improve implementaton of 'are both masks equal' check for AVX. Previously, we did a vector equal compare and then a movmsk, the result of which we checked to see if it was on for all lanes. Because masks are vectors of i32s, under AVX, the vector equal compare required two 4-wide SSE compares and some shuffling. Now, we do a movmsk of both masks first and then a scalar equality comparison of those two values, which seems to generate overall better code.	2011-09-15 06:25:02 -07:00
Matt Pharr	0848c2cc19	Actually make all 'if' statements check for 'all off' mask. Contrary to claims in `0c2048385`, that checkin didn't include the changes to not run if/else blocks if none of the program instances wanted to be running them. This checkin fixes that and thus actually fixes issue #74.	2011-09-13 19:48:04 -07:00
Matt Pharr	e2a88d491f	Mark the internal __fast_masked_vload function as static	2011-09-13 15:43:48 -07:00
Matt Pharr	30f9dcd4f5	Unroll loops by default, add --opt=disable-loop-unroll to disable. Issue #78.	2011-09-13 15:37:18 -07:00
Matt Pharr	0c344b6755	Fix Linux build of mandelbrot_tasks example	2011-09-13 15:17:30 -07:00
Matt Pharr	6734021520	Issue warning when compile-time constant out-of-bounds array index is used. Issue #98. Also fixes two examples that had bugs of this type that this warning uncovered!	2011-09-13 14:42:20 -07:00
Matt Pharr	dd153d3c5c	Handle more instruction types when flattening offset vectors. Generalize the lScalarizeVector() utility routine (used in determining when we can change gathers/scatters into vector loads/stores, respectively) to handle vector shuffles and vector loads. This fixes issue #79, which provided a case where a gather was being performed even though a vector load was possible.	2011-09-13 09:43:56 -07:00
Matt Pharr	9ca7541d52	Remove check for any program instances running before function calls. Given the change in `0c20483853`, this is no longer necessary, since we know that one instance will always be running if we're executing a given block of code.	2011-09-13 06:26:16 -07:00
Matt Pharr	0c20483853	Make all "if" statements "coherent" ifs. Workaround for issue #74 . Using blend to do masked stores is unsafe if all of the lanes are off: it may read from or write to invalid memory. For now, this workaround transforms all 'if' statements into coherent 'if's, ensuring that an instruction only runs if at least on program instance wants to be running it. One nice thing about this change is that a number of implementations of various builtins can be simplified, since they no longer need to confirm that at least one program instance is running. It might be nice to re-enable regular if statements in a future checkin, but we'd want to make sure they don't have any masked loads or blended masked stores in their statement lists. There isn't a performance impact for any of the examples with this change, so it's unclear if this is important. Note that this only impacts 'if' statements with a varying condition.	2011-09-12 16:25:08 -07:00
Matt Pharr	9d4ff1bc06	Fix alignment in usage message	2011-09-12 15:06:41 -07:00
Matt Pharr	83f22f1939	Add experimental --fast-masked-vload flag for SSE.	2011-09-12 12:29:33 -07:00
Matt Pharr	6375ed9224	AVX: Fix bug with misdeclaration of blend intrinsic. This was preventing the "convert an all-on blend to one of the operand values" optimization from kicking on in AVX.	2011-09-12 06:42:38 -07:00
Matt Pharr	cf23cf9ef4	Fix typo in user guide. Issue #96	2011-09-12 05:24:32 -07:00
Matt Pharr	1147b53dcd	Add #define with target vector width in emitted headers	2011-09-09 09:33:56 -07:00
Matt Pharr	4cf831a651	When --fast-math is enabled, tell LLVM about it, too.	2011-09-09 09:32:59 -07:00
Matt Pharr	785d8a29d3	Run mem2reg pass even when doing -O0 compiles	2011-09-09 09:24:43 -07:00
Matt Pharr	46d2bad231	Fix malformed program crash	2011-09-09 09:24:43 -07:00
Matt Pharr	32da8e11b4	Fix crash with varying global vector types when emitting header file.	2011-09-09 09:16:59 -07:00
Matt Pharr	5dedb6f836	Add --scale command line argument to mandelbrot and rt examples. This applies a floating-point scale factor to the image resolution; it's useful for experiments with many-core systems where the base image resolution may not give enough work for good load-balancing with tasks.	2011-09-07 20:07:51 -07:00
Matt Pharr	2ea6d249d5	Fix mapping to 8, 16 program instances in AO bench example. With this, we now compute a correct image with AVX.	2011-09-07 11:34:24 -07:00
Matt Pharr	c86128e8ee	AVX: go back to using blend (vs. masked store) when possible. All of the masked store calls were inhibiting putting values into registers, which in turn led to a lot of unnecessary stack traffic. This approach seems to give better code in the end.	2011-09-07 11:26:49 -07:00
Matt Pharr	375f1cb8e8	Make octaves and octaves loop uniform in noise example	2011-09-07 10:34:23 -07:00
Matt Pharr	3ca7b6b078	Remove MCJIT stuff from ispc_test (fix Linux build)	2011-09-07 09:44:27 -07:00
Matt Pharr	effe901890	Add task-parallel version of aobench	2011-09-07 05:43:21 -07:00
Matt Pharr	4f451bd041	More AVX fixes Fix RNG state initialization for 16-wide targets Fix a number of bugs in reduce_add builtin implementations for AVX. Fix some tests that had incorrect expected results for the 16-wide case.	2011-09-06 15:53:11 -07:00
Matt Pharr	c76ef7b174	Add command-line option to specify position-independent codegen	2011-09-06 11:12:43 -07:00
Matt Pharr	743d82e935	Various documentation updates.	2011-09-06 09:51:02 -07:00
Matt Pharr	18546e9c6d	Add option to disable optimizations to test running script	2011-09-04 18:09:00 -07:00
Matt Pharr	f24ab16b91	Release notes, doxygen update for 1.0.7 release. v1.0.7	2011-09-03 07:33:39 -07:00
Matt Pharr	766b34683c	Fix Windows build	2011-09-03 07:23:16 -07:00
Matt Pharr	b5bfa43e92	Fix error with float suffixes	2011-09-02 13:09:25 -07:00
Matt Pharr	99221f7d17	Fix a few places in examples where C reference implementaion had a double-precision fp constant undesirably causing computation to be done in double precision. Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster. Makes scalar version of noise about 15% faster. Others are unchanged.	2011-09-01 16:31:22 -07:00
Matt Pharr	eb7913f1dd	AVX: fix alignment when changing masked load to regular load. Also added some debugging/tracing stuff (commented out). Commented out iffy assert that was hitting for avx stuff.	2011-09-01 15:45:49 -07:00
Matt Pharr	08cad7a665	AVX bugfixes	2011-09-01 14:23:10 -07:00
Matt Pharr	9cd92facbd	Fix test: was incorrectly failing for 8-wide targets	2011-09-01 05:03:49 -07:00
Matt Pharr	85063f493c	Revert attempt to be clever about which LLVM libraries to link in--just link all of them. (This was causing build problems for some folks.)	2011-09-01 05:02:44 -07:00
Matt Pharr	f65a20c700	AVX bugfix: when replacing 'all on' masked store with a store, the rvalue is operand 2, not operand 1 (which is the mask!)	2011-08-31 18:06:29 -07:00
Matt Pharr	e144724979	Improve performance of global atomics, taking advantage of associativity. For associative atomic ops (add, and, or, xor), we can take advantage of their associativity to do just a single hardware atomic instruction, rather than one for each of the running program instances (as the previous implementation did.) The basic approach is to locally compute a reduction across the active program instances with the given op and to then issue a single HW atomic with that reduced value as the operand. We then take the old value that was stored in the location that is returned from the HW atomic op and use that to compute the values to return to each of the program instances (conceptually representing the cumulative effect of each of the preceding program instances having performed their atomic operation.) Issue #56.	2011-08-31 05:35:01 -07:00
Matt Pharr	96a297c747	Small improvements to help output	2011-08-30 14:48:22 -07:00

1 2 3 4 5 ...

277 Commits