Commit Graph

179 Commits

Author SHA1 Message Date
Evghenii
7d0aa7a336 added shift 2014-01-22 20:43:53 +01:00
Evghenii
bc99897fbb +fixed some example, found some bugs, and bugs in ptxas/cuda 2014-01-21 14:51:27 +01:00
Evghenii
63d3ac6679 Merge branch 'master' into nvptx 2014-01-20 13:47:24 +01:00
Ilia Filippov
5fa8bd3c78 changes for support LLVM trunk 2014-01-15 14:17:35 +04:00
Evghenii
9389b6e3ef basic optimization path fails 2014-01-10 06:34:44 +01:00
evghenii
9053eed4b4 added basic optimization pass that promotes uniform into varying variables (not array) for nvptx target 2014-01-10 06:32:57 +01:00
Evghenii
0a66f17897 experimental support for non-constant [non-static] uniform arrays mapped to addrspace(3) 2014-01-08 11:06:14 +01:00
Ilia Filippov
4ef38e1615 Adding some optimization passes between two Alias Analysis passes 2013-12-27 19:22:19 +04:00
Dmitry Babokin
5cfd773ec9 Adding Alias Analysis phases 2013-12-26 10:54:05 +04:00
Dmitry Babokin
949984db18 Don't do sext+and optimization for generic targets 2013-12-23 16:31:33 +04:00
Ilia Filippov
b5dc78b06e adding support of shl instruction in lExtractConstantOffset optimization 2013-12-16 16:01:14 +04:00
Dmitry Babokin
2d2d14744b Fixing --opt=force-aligned-memory for LLVM 3.3+ 2013-12-04 19:00:02 +04:00
Dmitry Babokin
be813ea0a2 Select optimization for LLVM 3.3 2013-11-28 21:43:05 +04:00
Ilia Filippov
3fd9d5a025 support of LLVM 3.5 2013-11-21 19:09:43 +04:00
james.brodman
0f7050d3aa More stds compliant. VS doesn't like non constant length local arrays. 2013-10-31 19:51:13 -04:00
james.brodman
85eb4cf0d6 Fix logic that looks for shift builtins. 2013-10-29 14:02:32 -04:00
james.brodman
8ee3178166 Add Performance Warning 2013-10-28 16:51:02 -04:00
james.brodman
09a6e37154 Source cleanup. 2013-10-28 16:37:33 -04:00
james.brodman
1b8e745ffe remove condition. Don't use gcc 4.7 for tests. 2013-10-28 16:36:59 -04:00
james.brodman
9ba7b96825 Make the new optimization play nicely with the other.s 2013-10-28 16:14:31 -04:00
james.brodman
d2b89e0e37 Tweak generic target. 2013-10-23 18:01:01 -04:00
james.brodman
4d289b16c2 Redesign after being hit with the KISS bat. 2013-10-23 14:25:43 -04:00
james.brodman
899f85ce9c Initial Support for new stdlib shift operator 2013-10-22 18:06:54 -04:00
Matt Pharr
7ab4c5391c Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target. 2013-08-09 19:56:43 -07:00
Matt Pharr
1d76f74b16 Fix compiler warnings 2013-08-07 12:53:39 -07:00
Matt Pharr
5e5d42b918 Fix build with LLVM 3.1 2013-08-06 17:55:37 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
Matt Pharr
ccdbddd388 Add peephole optimization to match int8/int16 averages.
Match the following patterns in IR, turning them into target-specific
intrinsics (e.g. PAVGB on x86) when possible.

(unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2)
(unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2)
(int8)(((int16)a + (int16)b + 1)/2)
(int8)(((int16)a + (int16)b)/2)
(int16)(((int32)a + (int32)b + 1)/2)
(int16)(((int32)a + (int32)b)/2)
2013-08-06 08:59:46 -07:00
Matt Pharr
5b20b06bd9 Add avg_{up,down}_int{8,16} routines to stdlib
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact.  When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)

A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
Matt Pharr
bba84f247c Improved optimization of vector select instructions.
Various LLVM optimization passes are turning code like:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = sext <8 x i1> %cmp to <8 x i32>
. . .
%cmp1 = trunc <8 x i32> %cmp32 to <8 x i1>
%result = select <8 x i1> %cmp1, . . .

Into:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = zext <8 x i1> %cmp to <8 x i32>   # note: zext
. . .
%cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer
%result = select <8 x i1> %cmp1, …

Which in turn isn't matched well by the LLVM code generators, which
in turn leads to fairly inefficient code.  (i.e. it doesn't just emit
a vector compare and blend instruction.)

Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better
describe its functionality.
2013-07-25 09:46:01 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Dmitry Babokin
fdcec5a219 Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call 2013-06-24 10:08:06 +04:00
Ilia Filippov
d92f9df17c changes in function LLVMFlattenInsertChain 2013-06-14 15:21:45 +04:00
Dmitry Babokin
eb2e5f378c Comment fixes 2013-04-18 15:36:35 +04:00
Dmitry Babokin
cb650d6100 One more opportunity to do better broadcast 2013-04-17 20:56:32 +04:00
Dmitry Babokin
5898532605 Broadcast implementation as InsertElement+Shuffle and related improvements 2013-04-10 02:18:24 +04:00
james.brodman
0a3822f2e5 Fix to make sure we're generating 32-bit gather/scatter when force32bitaddressing is set. 2013-04-05 16:21:05 -04:00
Dmitry Babokin
0af2a13349 DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout 2013-03-23 14:38:51 +04:00
Dmitry Babokin
0f86255279 Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets). 2013-03-23 14:28:05 +04:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
james.brodman
3aaf2ef2d4 ToT Fixes / M4 macro fix 2013-01-14 14:55:10 -05:00
Matt Pharr
0bf1320a32 Remove support for building with LLVM 3.0 2013-01-06 12:27:53 -08:00
Matt Pharr
81dbd504aa Small fixes to eliminate compiler warnings when using clang 2013-01-06 12:10:54 -08:00
Matt Pharr
63dd7d9859 Fix build to work with LLVM top-of-tree again 2013-01-06 12:02:08 -08:00
ptu1
810784da1f Set the ScalarReplAggregate maximum structure size based on target vector width. 2012-11-13 12:35:45 -08:00
Matt Pharr
172a189c6f Fix build with LLVM top-of-tree 2012-10-17 11:11:50 -07:00
Matt Pharr
be2108260e Add --opt=force-aligned-memory option.
This forces all vector loads/stores to be done assuming that the given
pointer is aligned to the vector size, thus allowing the use of sometimes
more-efficient instructions.  (If it isn't the case that the memory is
aligned, the program will fail!).
2012-09-14 13:49:45 -07:00