aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	9d4ff1bc06	Fix alignment in usage message	2011-09-12 15:06:41 -07:00
Matt Pharr	83f22f1939	Add experimental --fast-masked-vload flag for SSE.	2011-09-12 12:29:33 -07:00
Matt Pharr	6375ed9224	AVX: Fix bug with misdeclaration of blend intrinsic. This was preventing the "convert an all-on blend to one of the operand values" optimization from kicking on in AVX.	2011-09-12 06:42:38 -07:00
Matt Pharr	cf23cf9ef4	Fix typo in user guide. Issue #96	2011-09-12 05:24:32 -07:00
Matt Pharr	1147b53dcd	Add #define with target vector width in emitted headers	2011-09-09 09:33:56 -07:00
Matt Pharr	4cf831a651	When --fast-math is enabled, tell LLVM about it, too.	2011-09-09 09:32:59 -07:00
Matt Pharr	785d8a29d3	Run mem2reg pass even when doing -O0 compiles	2011-09-09 09:24:43 -07:00
Matt Pharr	46d2bad231	Fix malformed program crash	2011-09-09 09:24:43 -07:00
Matt Pharr	32da8e11b4	Fix crash with varying global vector types when emitting header file.	2011-09-09 09:16:59 -07:00
Matt Pharr	5dedb6f836	Add --scale command line argument to mandelbrot and rt examples. This applies a floating-point scale factor to the image resolution; it's useful for experiments with many-core systems where the base image resolution may not give enough work for good load-balancing with tasks.	2011-09-07 20:07:51 -07:00
Matt Pharr	2ea6d249d5	Fix mapping to 8, 16 program instances in AO bench example. With this, we now compute a correct image with AVX.	2011-09-07 11:34:24 -07:00
Matt Pharr	c86128e8ee	AVX: go back to using blend (vs. masked store) when possible. All of the masked store calls were inhibiting putting values into registers, which in turn led to a lot of unnecessary stack traffic. This approach seems to give better code in the end.	2011-09-07 11:26:49 -07:00
Matt Pharr	375f1cb8e8	Make octaves and octaves loop uniform in noise example	2011-09-07 10:34:23 -07:00
Matt Pharr	3ca7b6b078	Remove MCJIT stuff from ispc_test (fix Linux build)	2011-09-07 09:44:27 -07:00
Matt Pharr	effe901890	Add task-parallel version of aobench	2011-09-07 05:43:21 -07:00
Matt Pharr	4f451bd041	More AVX fixes Fix RNG state initialization for 16-wide targets Fix a number of bugs in reduce_add builtin implementations for AVX. Fix some tests that had incorrect expected results for the 16-wide case.	2011-09-06 15:53:11 -07:00
Matt Pharr	c76ef7b174	Add command-line option to specify position-independent codegen	2011-09-06 11:12:43 -07:00
Matt Pharr	743d82e935	Various documentation updates.	2011-09-06 09:51:02 -07:00
Matt Pharr	18546e9c6d	Add option to disable optimizations to test running script	2011-09-04 18:09:00 -07:00
Matt Pharr	f24ab16b91	Release notes, doxygen update for 1.0.7 release. v1.0.7	2011-09-03 07:33:39 -07:00
Matt Pharr	766b34683c	Fix Windows build	2011-09-03 07:23:16 -07:00
Matt Pharr	b5bfa43e92	Fix error with float suffixes	2011-09-02 13:09:25 -07:00
Matt Pharr	99221f7d17	Fix a few places in examples where C reference implementaion had a double-precision fp constant undesirably causing computation to be done in double precision. Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster. Makes scalar version of noise about 15% faster. Others are unchanged.	2011-09-01 16:31:22 -07:00
Matt Pharr	eb7913f1dd	AVX: fix alignment when changing masked load to regular load. Also added some debugging/tracing stuff (commented out). Commented out iffy assert that was hitting for avx stuff.	2011-09-01 15:45:49 -07:00
Matt Pharr	08cad7a665	AVX bugfixes	2011-09-01 14:23:10 -07:00
Matt Pharr	9cd92facbd	Fix test: was incorrectly failing for 8-wide targets	2011-09-01 05:03:49 -07:00
Matt Pharr	85063f493c	Revert attempt to be clever about which LLVM libraries to link in--just link all of them. (This was causing build problems for some folks.)	2011-09-01 05:02:44 -07:00
Matt Pharr	f65a20c700	AVX bugfix: when replacing 'all on' masked store with a store, the rvalue is operand 2, not operand 1 (which is the mask!)	2011-08-31 18:06:29 -07:00
Matt Pharr	e144724979	Improve performance of global atomics, taking advantage of associativity. For associative atomic ops (add, and, or, xor), we can take advantage of their associativity to do just a single hardware atomic instruction, rather than one for each of the running program instances (as the previous implementation did.) The basic approach is to locally compute a reduction across the active program instances with the given op and to then issue a single HW atomic with that reduced value as the operand. We then take the old value that was stored in the location that is returned from the HW atomic op and use that to compute the values to return to each of the program instances (conceptually representing the cumulative effect of each of the preceding program instances having performed their atomic operation.) Issue #56.	2011-08-31 05:35:01 -07:00
Matt Pharr	96a297c747	Small improvements to help output	2011-08-30 14:48:22 -07:00
Matt Pharr	67e00b97c6	Fix incorrect assertions in ConstExpr constructors	2011-08-30 11:08:53 -07:00
Matt Pharr	a94cabc692	Modify stencil example to do separate runs with and without task parallelism.	2011-08-30 05:08:21 -07:00
Matt Pharr	ad9e66650d	AVX bugfix with alignment for store instructions. When replacing 'all on' masked store with regular store, set alignment to be the vector element alignment, not the alignment for a whole vector. (i.e. 4 or 8 byte alignment, not 32 or 64).	2011-08-29 16:58:48 -07:00
Matt Pharr	6de494cfdb	Fix AVX bug introduced in `4ab982bc16`	2011-08-29 16:50:59 -07:00
Matt Pharr	58e34ba4ae	Add new test-driver script, run_tests.py. Old run_tests.sh still lives (for now). Changes include: - Tests are run in parallel across all of the available CPU cores - Option to create a statically-linked executable for each test (rather than using the LLVM JIT). This is in particular useful for AVX, which doesn't have good JIT support yet. - Static executables also makes it possible to test x86, not just x86-64, codegen. - Fixed a number of tests in failing_tests, which were actually failing due to the fact that the expected function signature of tests had changed.	2011-08-29 14:15:09 -07:00
Matt Pharr	33feeffe5d	Update timing header so it works with C code	2011-08-29 11:23:43 -07:00
Matt Pharr	d0db46aac5	Use logical shift right op for shifts of unsigned ints. Fixes issue #88 .	2011-08-29 10:32:26 -07:00
Matt Pharr	da76396c75	Fix typo in SSE2 attributes string.	2011-08-27 08:59:25 -07:00
Matt Pharr	bbf3fb6307	Disable popcnt on SSE4 targets--should only enable if system CPU supports it	2011-08-27 04:09:55 -07:00
Matt Pharr	4ab982bc16	Various AVX fixes (found by inspection). Emit calls to masked_store, not masked_store_blend, when handling masked stores emitted by the frontend. Fix bug in binary8to16 macro in builtins.m4 Fix bug in 16-wide version of __reduce_add_float Remove blend function implementations for masked_store_blend for AVX; just forward those on to the corresponding real masked store functions.	2011-08-26 12:58:02 -07:00
Matt Pharr	34301e09f5	Fix incorrect comment in builtins definitions files. (And all of the places it was cut and pasted to. :-( ).	2011-08-26 10:44:46 -07:00
Matt Pharr	84e586e767	Commit correct atomics tests	2011-08-26 10:43:30 -07:00
Matt Pharr	72a2f5d2f4	Make SSE2 __popcnt_int64 return i64 to be consistent with other targets	2011-08-26 10:42:12 -07:00
Matt Pharr	606cbab0d4	Performance improvements for global min/max atomics. Issue #57 . Compute a "local" min/max across the active program instances and then do a single atomic memory op. Added a few tests to exercise global min/max atomics (which were previously untested!)	2011-08-26 10:35:24 -07:00
Matt Pharr	54ec56c81d	Clean up and centralize LLVM target initialization	2011-08-26 10:15:33 -07:00
Matt Pharr	a322398c62	When emitting header files, put 'extern' declarations of globals used in ispc code outside of the ispc namespace. Fixes issue #64.	2011-08-26 10:03:06 -07:00
Matt Pharr	f22b3a25bd	Update command-line processing and usage string now that we have a preprocessor on Windows. We had been prohibiting Windows users from providing #definitions on the command line, which is the wrong thing to do ever since we switched to using the clang preprocessor.	2011-08-26 09:58:08 -07:00
Matt Pharr	b67498766e	Big rewrite / improvement of target handling. If no CPU is specified, use the host CPU type, not just a default of "nehalem". Provide better features strings to the LLVM target machinery. -> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2 target (Fixes issue #82). -> Slight code improvements from using cmovs in generated code now Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code that doesn't call the popcnt instruction now that we properly tell LLVM which instructions are and aren't available for SSE2.)	2011-08-26 09:54:45 -07:00
Matt Pharr	c340ff3893	Fixes to build with LLVM ToT	2011-08-25 08:53:56 +01:00
Matt Pharr	b0f59777d4	Silly bug: don't pass NULL to the print() stmt when we want a llvm::Value * that has the value NULL. (This was causing crashes with print() statements with no additional values to be printed.)	2011-08-25 07:48:13 +01:00

... 28 29 30 31 32 ...

1657 Commits