aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
james.brodman	6be3c24ee5	Separate -O and -g	2013-08-15 15:24:46 -04:00
Matt Pharr	42f31aed69	Another attempt at fixing the Windows build (added sse4-8/sse4-16 targets).	2013-08-14 11:02:45 -07:00
Matt Pharr	ed017c42f1	Fix ispc.vcxproj for Windows builds	2013-08-11 07:47:20 -07:00
Matt Pharr	4766467271	Revert ispc.vcxproj to version from top-of-tree.	2013-08-10 11:23:39 -07:00
Matt Pharr	ea8591a85a	Fix build with LLVM top-of-tree (link libcurses)	2013-08-10 11:22:43 -07:00
Matt Pharr	7ab4c5391c	Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.	2013-08-09 19:56:43 -07:00
Matt Pharr	0c5742b6f8	Implement new naming scheme for --target. Now targets are named like "<isa>-i<mask size>x<gang size>", e.g. "sse4-i8x16", or "avx2-i32x16". The old target names are still supported.	2013-08-08 19:23:44 -07:00
Matt Pharr	1d76f74b16	Fix compiler warnings	2013-08-07 12:53:39 -07:00
Matt Pharr	5e5d42b918	Fix build with LLVM 3.1	2013-08-06 17:55:37 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
jbrodman	0755e4f8ff	Merge pull request #561 from dbabokin/neon_condition Fix for Windows build and making NEON target optional	2013-08-06 13:45:30 -07:00
Matt Pharr	ccdbddd388	Add peephole optimization to match int8/int16 averages. Match the following patterns in IR, turning them into target-specific intrinsics (e.g. PAVGB on x86) when possible. (unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2) (unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2) (unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2) (unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2) (int8)(((int16)a + (int16)b + 1)/2) (int8)(((int16)a + (int16)b)/2) (int16)(((int32)a + (int32)b + 1)/2) (int16)(((int32)a + (int32)b)/2)	2013-08-06 08:59:46 -07:00
Matt Pharr	5b20b06bd9	Add avg_{up,down}_int{8,16} routines to stdlib These compute the average of two given values, rounding up and down, respectively, if the result isn't exact. When possible, these are mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US] on NEON.) A subsequent commit will add pattern-matching to generate calls to these intrinsincs when the corresponding patterns are detected in the IR.)	2013-08-06 08:41:12 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Dmitry Babokin	fb34fc5a85	Merge pull request #559 from ifilippov/debug_phases Supporting dumping, switching off and debug printing of optimization phases.	2013-08-01 14:55:07 -07:00
Dmitry Babokin	43423c276f	Merge pull request #560 from ifilippov/perf Supporting perf.py on Mac OS	2013-08-01 13:20:01 -07:00
jbrodman	5ffc3a8f4c	Merge pull request #558 from dbabokin/win_examples Fix for examples to make them work on Windows properly	2013-08-01 08:02:42 -07:00
Ilia Filippov	3c06924a02	Supporting perf.py on Mac OS	2013-08-01 12:47:37 +04:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	4f48d3258a	Documentation updates for NEON	2013-07-31 20:06:04 -07:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	d7562d3836	Merge branch 'master' into arm	2013-07-31 06:38:17 -07:00
Dmitry Babokin	220f0b0b40	Renaming mandelbrot_tasks files to be different from mandelbrot	2013-07-30 19:53:12 -07:00
Matt Pharr	48ff03112f	Remove __pause from stdlib_core() in utils.m4. It wasn't ever being used, and was breaking compilation on ARM.	2013-07-30 08:44:22 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Dmitry Babokin	fa93cb7d0b	InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)	2013-07-29 22:46:36 -07:00
egaburov	153fbc3d7d	some changes	2013-07-29 11:05:05 +02:00
egaburov	307abc8db7	adding text folder to test ptx generator	2013-07-28 15:57:10 +02:00
egaburov	af61c9bae3	working on target-nvptx64... need to add nvptx64	2013-07-28 15:50:08 +02:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	2d063925a1	Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8. This is slightly cleaner than trunc-ing the i8 mask to i1 and using a vector select. (And is probably more safe in terms of good code.)	2013-07-25 09:46:01 -07:00
Matt Pharr	bba84f247c	Improved optimization of vector select instructions. Various LLVM optimization passes are turning code like: %cmp = icmp lt <8 x i32> %foo, %bar %cmp32 = sext <8 x i1> %cmp to <8 x i32> . . . %cmp1 = trunc <8 x i32> %cmp32 to <8 x i1> %result = select <8 x i1> %cmp1, . . . Into: %cmp = icmp lt <8 x i32> %foo, %bar %cmp32 = zext <8 x i1> %cmp to <8 x i32> # note: zext . . . %cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer %result = select <8 x i1> %cmp1, … Which in turn isn't matched well by the LLVM code generators, which in turn leads to fairly inefficient code. (i.e. it doesn't just emit a vector compare and blend instruction.) Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better describe its functionality.	2013-07-25 09:46:01 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	04d61afa23	Fix bug in lEmitVaryingSelect() for targets with i1 mask types. Commit `53414f12e6` introduced a but where lEmitVaryingSelect() would try to truncate a vector of i1s to a vector of i1s, which in turn made LLVM's IR analyzer unhappy.	2013-07-25 09:45:20 -07:00
Dmitry Babokin	663ebf7857	Merge pull request #551 from mmp/constfold Improvements to constant folding.	2013-07-24 10:27:04 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	15a3ef370a	Use @llvm.readcyclecounter to implement stdlib clock() function. Also added a test for the clock builtin.	2013-07-23 17:24:57 -07:00
Matt Pharr	c14659c675	Fix bug in lGetConstantInt() in parse.yy. Previously, we weren't handling signed/unsigned constant types correctly.	2013-07-23 17:24:57 -07:00
Matt Pharr	f7f281a256	Choose type for integer literals to match the target mask size (if possible). On a target with a 16-bit mask (for example), we would choose the type of an integer literal "1024" to be an int16. Previously, we used an int32, which is a worse fit and leads to less efficient code than an int16 on a 16-bit mask target. (However, we'd still give an integer literal 1000000 the type int32, even in a 16-bit target.) Updated the tests to still pass with 8 and 16-bit targets, given this change.	2013-07-23 17:24:50 -07:00
Matt Pharr	9ba49eabb2	Reduce estimated costs for 8 and 16-bit min() and max() in stdlib. These actually compile to a single instruction.	2013-07-23 16:52:43 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Matt Pharr	83e1630fbc	Add support for fast division of varying int values by small constants. For varying int8/16/32 types, divides by small constants can be implemented efficiently through multiplies and shifts with integer types of twice the bit-width; this commit adds this optimization. (Implementation is based on Halide.)	2013-07-23 16:49:56 -07:00
Matt Pharr	0277ba1aaa	Improve warnings for right shift by varying amounts. Fixes: - Don't issue a warning when the shift is a by the same amount in all vector lanes. - Do issue a warning when it's a compile-time constant but the values are different in different lanes. Previously, we warned iff the shift amount wasn't a compile-time constant.	2013-07-23 16:49:07 -07:00
Matt Pharr	753c001e69	Merge branch 'master' of https://github.com/ispc/ispc into constfold	2013-07-23 16:12:04 -07:00
Dmitry Babokin	10c0b42d0d	Merge pull request #549 from mmp/fix-tot Fix build with LLVM top-of-tree.	2013-07-23 09:14:08 -07:00
Matt Pharr	564e61c828	Improvements to constant folding. We can now do constant folding with all basic datatypes (the previous implementation handled int32 well, but had limited, if any, coverage for other datatypes.) Reduced a bit of repeated code in the constant folding implementation through template helper functions.	2013-07-22 16:12:02 -07:00
Matt Pharr	946c39a5df	Fix build with LLVM top-of-tree. The DIBuilder::getCU() method has been removed; we now just store the compilation unit returned when we call DIBuilder::createCompileUnit.	2013-07-22 15:42:52 -07:00

... 3 4 5 6 7 ...

1579 Commits