aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Dmitry Babokin	3f2217646e	Merge pull request #562 from mmp/arm New target naming scheme, new targets (SSE4-i8x16 and SSE4-i16x8), plus some cleanup and improvements.	2013-08-22 08:33:25 -07:00
Matt Pharr	611477e214	Revert change to lEmitVaryingSelect(). Using vector select versus a store and masked load for varying vector selects seems to give worse code. This may be related to http://llvm.org/bugs/show_bug.cgi?id=16941.	2013-08-22 07:50:25 -07:00
Dmitry Babokin	9bb5c314cd	Merge pull request #565 from dbabokin/run_tests run_tests.py fix and new switch.	2013-08-22 01:48:22 -07:00
Dmitry Babokin	f31a31478b	Moving time calculation earlier	2013-08-22 12:41:57 +04:00
Dmitry Babokin	5fb30939be	Fix for #564 , using wrong ispc in run_tests.py	2013-08-21 19:46:18 +04:00
Dmitry Babokin	60b413a9cb	Adding --non-interactive switch to run_tests.py	2013-08-21 19:25:30 +04:00
Matt Pharr	502f8fd76b	Reduce debug spew on failing idiv.ispc tests	2013-08-20 09:22:09 -07:00
Matt Pharr	2b2905b567	Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc. This should be a bool, not a one-wide vector of bools. The equivalent fix was previously made in generic-16.h, but not made here. (Note that many tests are still failing with these targets, but at least they compile properly now.)	2013-08-20 09:05:50 -07:00
Matt Pharr	e7f067d70c	Fix handling of __clock() builtin for "generic" targets.	2013-08-20 09:04:52 -07:00
Matt Pharr	d976da7559	Speed up idiv test (dont test int32 as thoroughly)	2013-08-20 08:49:51 -07:00
Dmitry Babokin	84dbd66d10	Merge pull request #563 from jbrodman/debugopt Separate -O and -g	2013-08-15 13:10:13 -07:00
james.brodman	6be3c24ee5	Separate -O and -g	2013-08-15 15:24:46 -04:00
Matt Pharr	42f31aed69	Another attempt at fixing the Windows build (added sse4-8/sse4-16 targets).	2013-08-14 11:02:45 -07:00
Matt Pharr	ed017c42f1	Fix ispc.vcxproj for Windows builds	2013-08-11 07:47:20 -07:00
Matt Pharr	4766467271	Revert ispc.vcxproj to version from top-of-tree.	2013-08-10 11:23:39 -07:00
Matt Pharr	ea8591a85a	Fix build with LLVM top-of-tree (link libcurses)	2013-08-10 11:22:43 -07:00
Matt Pharr	7ab4c5391c	Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.	2013-08-09 19:56:43 -07:00
Matt Pharr	0c5742b6f8	Implement new naming scheme for --target. Now targets are named like "<isa>-i<mask size>x<gang size>", e.g. "sse4-i8x16", or "avx2-i32x16". The old target names are still supported.	2013-08-08 19:23:44 -07:00
Matt Pharr	1d76f74b16	Fix compiler warnings	2013-08-07 12:53:39 -07:00
Matt Pharr	5e5d42b918	Fix build with LLVM 3.1	2013-08-06 17:55:37 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
jbrodman	0755e4f8ff	Merge pull request #561 from dbabokin/neon_condition Fix for Windows build and making NEON target optional	2013-08-06 13:45:30 -07:00
Matt Pharr	ccdbddd388	Add peephole optimization to match int8/int16 averages. Match the following patterns in IR, turning them into target-specific intrinsics (e.g. PAVGB on x86) when possible. (unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2) (unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2) (unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2) (unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2) (int8)(((int16)a + (int16)b + 1)/2) (int8)(((int16)a + (int16)b)/2) (int16)(((int32)a + (int32)b + 1)/2) (int16)(((int32)a + (int32)b)/2)	2013-08-06 08:59:46 -07:00
Matt Pharr	5b20b06bd9	Add avg_{up,down}_int{8,16} routines to stdlib These compute the average of two given values, rounding up and down, respectively, if the result isn't exact. When possible, these are mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US] on NEON.) A subsequent commit will add pattern-matching to generate calls to these intrinsincs when the corresponding patterns are detected in the IR.)	2013-08-06 08:41:12 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Dmitry Babokin	fb34fc5a85	Merge pull request #559 from ifilippov/debug_phases Supporting dumping, switching off and debug printing of optimization phases.	2013-08-01 14:55:07 -07:00
Dmitry Babokin	43423c276f	Merge pull request #560 from ifilippov/perf Supporting perf.py on Mac OS	2013-08-01 13:20:01 -07:00
jbrodman	5ffc3a8f4c	Merge pull request #558 from dbabokin/win_examples Fix for examples to make them work on Windows properly	2013-08-01 08:02:42 -07:00
Ilia Filippov	3c06924a02	Supporting perf.py on Mac OS	2013-08-01 12:47:37 +04:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	4f48d3258a	Documentation updates for NEON	2013-07-31 20:06:04 -07:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	d7562d3836	Merge branch 'master' into arm	2013-07-31 06:38:17 -07:00
Dmitry Babokin	220f0b0b40	Renaming mandelbrot_tasks files to be different from mandelbrot	2013-07-30 19:53:12 -07:00
Matt Pharr	48ff03112f	Remove __pause from stdlib_core() in utils.m4. It wasn't ever being used, and was breaking compilation on ARM.	2013-07-30 08:44:22 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Dmitry Babokin	fa93cb7d0b	InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)	2013-07-29 22:46:36 -07:00
egaburov	153fbc3d7d	some changes	2013-07-29 11:05:05 +02:00
egaburov	307abc8db7	adding text folder to test ptx generator	2013-07-28 15:57:10 +02:00
egaburov	af61c9bae3	working on target-nvptx64... need to add nvptx64	2013-07-28 15:50:08 +02:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	2d063925a1	Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8. This is slightly cleaner than trunc-ing the i8 mask to i1 and using a vector select. (And is probably more safe in terms of good code.)	2013-07-25 09:46:01 -07:00
Matt Pharr	bba84f247c	Improved optimization of vector select instructions. Various LLVM optimization passes are turning code like: %cmp = icmp lt <8 x i32> %foo, %bar %cmp32 = sext <8 x i1> %cmp to <8 x i32> . . . %cmp1 = trunc <8 x i32> %cmp32 to <8 x i1> %result = select <8 x i1> %cmp1, . . . Into: %cmp = icmp lt <8 x i32> %foo, %bar %cmp32 = zext <8 x i1> %cmp to <8 x i32> # note: zext . . . %cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer %result = select <8 x i1> %cmp1, … Which in turn isn't matched well by the LLVM code generators, which in turn leads to fairly inefficient code. (i.e. it doesn't just emit a vector compare and blend instruction.) Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better describe its functionality.	2013-07-25 09:46:01 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	04d61afa23	Fix bug in lEmitVaryingSelect() for targets with i1 mask types. Commit `53414f12e6` introduced a but where lEmitVaryingSelect() would try to truncate a vector of i1s to a vector of i1s, which in turn made LLVM's IR analyzer unhappy.	2013-07-25 09:45:20 -07:00
Dmitry Babokin	663ebf7857	Merge pull request #551 from mmp/constfold Improvements to constant folding.	2013-07-24 10:27:04 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00

... 6 7 8 9 10 ...

1740 Commits