aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	eb7913f1dd	AVX: fix alignment when changing masked load to regular load. Also added some debugging/tracing stuff (commented out). Commented out iffy assert that was hitting for avx stuff.	2011-09-01 15:45:49 -07:00
Matt Pharr	08cad7a665	AVX bugfixes	2011-09-01 14:23:10 -07:00
Matt Pharr	9cd92facbd	Fix test: was incorrectly failing for 8-wide targets	2011-09-01 05:03:49 -07:00
Matt Pharr	85063f493c	Revert attempt to be clever about which LLVM libraries to link in--just link all of them. (This was causing build problems for some folks.)	2011-09-01 05:02:44 -07:00
Matt Pharr	f65a20c700	AVX bugfix: when replacing 'all on' masked store with a store, the rvalue is operand 2, not operand 1 (which is the mask!)	2011-08-31 18:06:29 -07:00
Matt Pharr	e144724979	Improve performance of global atomics, taking advantage of associativity. For associative atomic ops (add, and, or, xor), we can take advantage of their associativity to do just a single hardware atomic instruction, rather than one for each of the running program instances (as the previous implementation did.) The basic approach is to locally compute a reduction across the active program instances with the given op and to then issue a single HW atomic with that reduced value as the operand. We then take the old value that was stored in the location that is returned from the HW atomic op and use that to compute the values to return to each of the program instances (conceptually representing the cumulative effect of each of the preceding program instances having performed their atomic operation.) Issue #56.	2011-08-31 05:35:01 -07:00
Matt Pharr	96a297c747	Small improvements to help output	2011-08-30 14:48:22 -07:00
Matt Pharr	67e00b97c6	Fix incorrect assertions in ConstExpr constructors	2011-08-30 11:08:53 -07:00
Matt Pharr	a94cabc692	Modify stencil example to do separate runs with and without task parallelism.	2011-08-30 05:08:21 -07:00
Matt Pharr	ad9e66650d	AVX bugfix with alignment for store instructions. When replacing 'all on' masked store with regular store, set alignment to be the vector element alignment, not the alignment for a whole vector. (i.e. 4 or 8 byte alignment, not 32 or 64).	2011-08-29 16:58:48 -07:00
Matt Pharr	6de494cfdb	Fix AVX bug introduced in `4ab982bc16`	2011-08-29 16:50:59 -07:00
Matt Pharr	58e34ba4ae	Add new test-driver script, run_tests.py. Old run_tests.sh still lives (for now). Changes include: - Tests are run in parallel across all of the available CPU cores - Option to create a statically-linked executable for each test (rather than using the LLVM JIT). This is in particular useful for AVX, which doesn't have good JIT support yet. - Static executables also makes it possible to test x86, not just x86-64, codegen. - Fixed a number of tests in failing_tests, which were actually failing due to the fact that the expected function signature of tests had changed.	2011-08-29 14:15:09 -07:00
Matt Pharr	33feeffe5d	Update timing header so it works with C code	2011-08-29 11:23:43 -07:00
Matt Pharr	d0db46aac5	Use logical shift right op for shifts of unsigned ints. Fixes issue #88 .	2011-08-29 10:32:26 -07:00
Matt Pharr	da76396c75	Fix typo in SSE2 attributes string.	2011-08-27 08:59:25 -07:00
Matt Pharr	bbf3fb6307	Disable popcnt on SSE4 targets--should only enable if system CPU supports it	2011-08-27 04:09:55 -07:00
Matt Pharr	4ab982bc16	Various AVX fixes (found by inspection). Emit calls to masked_store, not masked_store_blend, when handling masked stores emitted by the frontend. Fix bug in binary8to16 macro in builtins.m4 Fix bug in 16-wide version of __reduce_add_float Remove blend function implementations for masked_store_blend for AVX; just forward those on to the corresponding real masked store functions.	2011-08-26 12:58:02 -07:00
Matt Pharr	34301e09f5	Fix incorrect comment in builtins definitions files. (And all of the places it was cut and pasted to. :-( ).	2011-08-26 10:44:46 -07:00
Matt Pharr	84e586e767	Commit correct atomics tests	2011-08-26 10:43:30 -07:00
Matt Pharr	72a2f5d2f4	Make SSE2 __popcnt_int64 return i64 to be consistent with other targets	2011-08-26 10:42:12 -07:00
Matt Pharr	606cbab0d4	Performance improvements for global min/max atomics. Issue #57 . Compute a "local" min/max across the active program instances and then do a single atomic memory op. Added a few tests to exercise global min/max atomics (which were previously untested!)	2011-08-26 10:35:24 -07:00
Matt Pharr	54ec56c81d	Clean up and centralize LLVM target initialization	2011-08-26 10:15:33 -07:00
Matt Pharr	a322398c62	When emitting header files, put 'extern' declarations of globals used in ispc code outside of the ispc namespace. Fixes issue #64.	2011-08-26 10:03:06 -07:00
Matt Pharr	f22b3a25bd	Update command-line processing and usage string now that we have a preprocessor on Windows. We had been prohibiting Windows users from providing #definitions on the command line, which is the wrong thing to do ever since we switched to using the clang preprocessor.	2011-08-26 09:58:08 -07:00
Matt Pharr	b67498766e	Big rewrite / improvement of target handling. If no CPU is specified, use the host CPU type, not just a default of "nehalem". Provide better features strings to the LLVM target machinery. -> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2 target (Fixes issue #82). -> Slight code improvements from using cmovs in generated code now Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code that doesn't call the popcnt instruction now that we properly tell LLVM which instructions are and aren't available for SSE2.)	2011-08-26 09:54:45 -07:00
Matt Pharr	c340ff3893	Fixes to build with LLVM ToT	2011-08-25 08:53:56 +01:00
Matt Pharr	b0f59777d4	Silly bug: don't pass NULL to the print() stmt when we want a llvm::Value * that has the value NULL. (This was causing crashes with print() statements with no additional values to be printed.)	2011-08-25 07:48:13 +01:00
Matt Pharr	e14208f489	Update to call DIBuilder::finalize() with LLVM 3.0	2011-08-24 22:28:20 +01:00
Matt Pharr	7756265503	Add double-pumped AVX target (i.e., run 16-wide). Not yet tested.	2011-08-20 11:28:22 +01:00
Matt Pharr	f841b775c3	Small bugfixes in AVX builtins	2011-08-20 09:09:55 +01:00
Matt Pharr	8c921544a0	fix broken test	2011-08-18 20:40:50 +01:00
Matt Pharr	fe54f1ad8e	Fixes to build with latest LLVM ToT	2011-08-18 08:34:49 +01:00
Matt Pharr	74c2c8ae07	Linux build fixes	2011-08-17 07:08:44 -07:00
Matt Pharr	87ec7aa10d	release notes, housekeeping for 1.0.6 release v1.0.6	2011-08-17 14:55:21 +01:00
Matt Pharr	206c851146	Various improvements to example task systems in examples/. - Only have a single copy of all of the tasks_*.cpp sample implementations, stored in examples/. - Reduce dynamic storage allocation and locking in task launch code paths. - Don't have a hard limit of the number of tasks that can be launched on Windows (fix issue #85).	2011-08-17 14:31:45 +01:00
Matt Pharr	60bdf1ef8a	Modify rt example to also do a set of runs with tasks + SPMD together.	2011-08-17 13:14:32 +01:00
Matt Pharr	d7662b3eb9	Use reduce_equal() in volume rendering example to avoid some gathers. Modified this example to use reduce_equal() to see if all of the program instances want to load the 8 sample values around the same voxel. When this is the case, we can just do 8 scalar loads, rather than needing to do a fully general gather. Once this check fails, it isn't done again, since it's not likely to start succeeding in the future. This gives a ~10% speedup with the low-res data set, and basically no performance difference with the high-res one. (It makes sense that the lower-resolution the voxel sampling, the longer all of the rays will stay in the same set of voxels.)	2011-08-17 12:37:07 +01:00
Matt Pharr	ecaa57c7c6	Add volume rendering example. (~2.3x speedup from SIMD vs serial code.)	2011-08-17 12:05:37 +01:00
Matt Pharr	fce183c244	Merge branch 'master' of github.com:ispc/ispc	2011-08-17 10:32:49 +01:00
Matt Pharr	7a92f8b3f9	Add MSVC build support for stencil example	2011-08-17 02:28:49 -07:00
Matt Pharr	96af08e789	Print notices about image files being written	2011-08-16 06:31:26 +01:00
Matt Pharr	cb29c10660	Fix tests on Windows: need arch=x86 since ispc_test.exe is a32-bit app	2011-08-15 08:25:08 -07:00
Matt Pharr	04c93043d6	Target handling fixes. Set the Module's target appropriately when it's first created. Compile separate 32 and 64 bit versions of the builtins-c bitcocde and load the appropriate one based on the target we're compiling for.	2011-08-15 16:03:50 +01:00
Matt Pharr	46037c7a11	Merge branch 'master' of github.com:ispc/ispc	2011-08-15 12:44:38 +01:00
Matt Pharr	c570108026	Fix linux build of stencil example	2011-08-15 04:44:17 -07:00
Matt Pharr	230a0fadea	Attempt to generate debug info for task parameters.	2011-08-15 12:31:56 +01:00
Matt Pharr	87cf05e0d2	Improve performance of 64-bit reduce_equal implementations. Just pulling out the elements and doing a set of scalar equality tests is the best approach for those (nearly 2x better than the rotate and vector equality check that we use for 32-bit stuff).	2011-08-14 07:39:05 +01:00
Matt Pharr	ff608eef71	Change reduce_equal to return false if no instances are executing	2011-08-14 07:11:45 +01:00
Matt Pharr	f868a63064	Add support for scan operations across program instances (add, and, or).	2011-08-13 20:11:41 +01:00
Matt Pharr	c74116aa24	Fix crasher with malformed program	2011-08-12 07:47:17 +01:00

... 2 3 4 5 6 ...

334 Commits