aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	cabe358c0a	Workaround change to linker behavior in LLVM 3.1 Now, the Linker::LinkModules() call doesn't link in any functions marked as 'internal', which is problematic, since we'd like to have just about all of the builtins marked as internal so that they are eliminated after they've been inlined when they are in fact used. This change removes all of the internal qualifiers in the builtins and adds a lSetInternalFunctions() routine to builtins.cpp that sets this property on the functions that need it after they've been linked in by LinkModules().	2011-11-05 16:57:26 -07:00
Matt Pharr	afcd42028f	Add support for function pointers. Both uniform and varying function pointers are supported; when a function is called through a varying function pointer, each unique function pointer value across the running program instances is called once for the set of active program instances that want to call it.	2011-11-03 16:14:14 -07:00
Matt Pharr	422b8268a9	Add assert() statement support. Issue #106 .	2011-10-15 13:50:05 -07:00
Matt Pharr	49454bc207	Fix silly bug in 16-wide AOS-SOA 3-vector routine	2011-10-11 16:16:56 -07:00
Matt Pharr	286c23426e	Add "double-wide" sse2-x2 target. i.e. run 8 program instances together, along the lines of the double-pumped sse4-x2 target.	2011-10-11 15:17:31 -07:00
Matt Pharr	3cb0115dce	Add routines to standard library to do efficient AOS/SOA conversions. Currently, we just support 3 and 4-wide variants (i.e. xyzxyz.. and xyzwxyzw..), for int32 and float types.	2011-10-10 10:56:06 -07:00
Matt Pharr	cb7976bbf6	Added updated task launch implementation that now tracks task groups. Within each function that launches tasks, we now can easily track which tasks that function launched, so that the sync at the end of the function can just sync on the tasks launched by that function (not all tasks launched by all functions.) Implementing this led to a rework of the task system API that ispc generates code to call; the example task systems in examples/tasksys.cpp have been updated to conform to this API. (The updated API is also documented in the ispc user's guide.) As part of this, "launch[n]" syntax was added to launch a number of tasks in a single launch statement, rather than requiring a loop over 'n' to launch n tasks. This commit thus fixes issue #84 (enhancement to launch multiple tasks from a single launch statement) as well as issue #105 (recursive task launches were broken).	2011-09-30 11:20:53 -07:00
Matt Pharr	5ee4d7fce8	Add comment	2011-09-30 11:11:52 -07:00
Matt Pharr	32a0a30cf5	Only allow exact matches for function overload resolution for builtins. The intent is that the code in stdlib.ispc that is calling out to the built-ins should match argument types exactly (using explicit casts as needed), just for maximal clarity/safety.	2011-09-28 17:20:31 -07:00
Matt Pharr	aad269fdf4	Added support for 'uniform' global atomics. Issue #93.	2011-09-28 16:06:07 -07:00
Matt Pharr	0c20483853	Make all "if" statements "coherent" ifs. Workaround for issue #74 . Using blend to do masked stores is unsafe if all of the lanes are off: it may read from or write to invalid memory. For now, this workaround transforms all 'if' statements into coherent 'if's, ensuring that an instruction only runs if at least on program instance wants to be running it. One nice thing about this change is that a number of implementations of various builtins can be simplified, since they no longer need to confirm that at least one program instance is running. It might be nice to re-enable regular if statements in a future checkin, but we'd want to make sure they don't have any masked loads or blended masked stores in their statement lists. There isn't a performance impact for any of the examples with this change, so it's unclear if this is important. Note that this only impacts 'if' statements with a varying condition.	2011-09-12 16:25:08 -07:00
Matt Pharr	83f22f1939	Add experimental --fast-masked-vload flag for SSE.	2011-09-12 12:29:33 -07:00
Matt Pharr	c86128e8ee	AVX: go back to using blend (vs. masked store) when possible. All of the masked store calls were inhibiting putting values into registers, which in turn led to a lot of unnecessary stack traffic. This approach seems to give better code in the end.	2011-09-07 11:26:49 -07:00
Matt Pharr	e144724979	Improve performance of global atomics, taking advantage of associativity. For associative atomic ops (add, and, or, xor), we can take advantage of their associativity to do just a single hardware atomic instruction, rather than one for each of the running program instances (as the previous implementation did.) The basic approach is to locally compute a reduction across the active program instances with the given op and to then issue a single HW atomic with that reduced value as the operand. We then take the old value that was stored in the location that is returned from the HW atomic op and use that to compute the values to return to each of the program instances (conceptually representing the cumulative effect of each of the preceding program instances having performed their atomic operation.) Issue #56.	2011-08-31 05:35:01 -07:00
Matt Pharr	4ab982bc16	Various AVX fixes (found by inspection). Emit calls to masked_store, not masked_store_blend, when handling masked stores emitted by the frontend. Fix bug in binary8to16 macro in builtins.m4 Fix bug in 16-wide version of __reduce_add_float Remove blend function implementations for masked_store_blend for AVX; just forward those on to the corresponding real masked store functions.	2011-08-26 12:58:02 -07:00
Matt Pharr	606cbab0d4	Performance improvements for global min/max atomics. Issue #57 . Compute a "local" min/max across the active program instances and then do a single atomic memory op. Added a few tests to exercise global min/max atomics (which were previously untested!)	2011-08-26 10:35:24 -07:00
Matt Pharr	7756265503	Add double-pumped AVX target (i.e., run 16-wide). Not yet tested.	2011-08-20 11:28:22 +01:00
Matt Pharr	87cf05e0d2	Improve performance of 64-bit reduce_equal implementations. Just pulling out the elements and doing a set of scalar equality tests is the best approach for those (nearly 2x better than the rotate and vector equality check that we use for 32-bit stuff).	2011-08-14 07:39:05 +01:00
Matt Pharr	ff608eef71	Change reduce_equal to return false if no instances are executing	2011-08-14 07:11:45 +01:00
Matt Pharr	f868a63064	Add support for scan operations across program instances (add, and, or).	2011-08-13 20:11:41 +01:00
Matt Pharr	8c534d4d74	Add reduce_equal() function to standard library.	2011-08-10 15:55:55 -07:00
Matt Pharr	0ac4f7b620	Add various prefetch functions to the standard library.	2011-08-03 13:31:45 -07:00
Matt Pharr	a4bb6b5520	Add new example with implementation of Perlin Noise ~4.2x speedup versus serial on OSX / gcc. ~2.9x speedup versus serial on Windows / MSVC.	2011-08-01 10:33:18 +01:00
Matt Pharr	a552927a6a	Cleanup implementation of target builtins code. - Renamed stdlib-sse.ll to builtins-sse.ll (etc.) in an attempt to better indicate the fact that the stuff in those files has a role beyond implementing stuff for the standard library. - Moved declarations of the various __pseudo_* functions from being done with LLVM API calls in builtins.cpp to just straight up declarations in LLVM assembly language in builtins.m4. (Much less code to do it this way, and more clear what's going on.)	2011-08-01 05:58:43 +01:00

24 Commits