Commit Graph

159 Commits

Author SHA1 Message Date
james.brodman
d2b89e0e37 Tweak generic target. 2013-10-23 18:01:01 -04:00
james.brodman
4d289b16c2 Redesign after being hit with the KISS bat. 2013-10-23 14:25:43 -04:00
james.brodman
899f85ce9c Initial Support for new stdlib shift operator 2013-10-22 18:06:54 -04:00
Matt Pharr
7ab4c5391c Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target. 2013-08-09 19:56:43 -07:00
Matt Pharr
1d76f74b16 Fix compiler warnings 2013-08-07 12:53:39 -07:00
Matt Pharr
5e5d42b918 Fix build with LLVM 3.1 2013-08-06 17:55:37 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
Matt Pharr
ccdbddd388 Add peephole optimization to match int8/int16 averages.
Match the following patterns in IR, turning them into target-specific
intrinsics (e.g. PAVGB on x86) when possible.

(unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2)
(unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2)
(int8)(((int16)a + (int16)b + 1)/2)
(int8)(((int16)a + (int16)b)/2)
(int16)(((int32)a + (int32)b + 1)/2)
(int16)(((int32)a + (int32)b)/2)
2013-08-06 08:59:46 -07:00
Matt Pharr
5b20b06bd9 Add avg_{up,down}_int{8,16} routines to stdlib
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact.  When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)

A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
Matt Pharr
bba84f247c Improved optimization of vector select instructions.
Various LLVM optimization passes are turning code like:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = sext <8 x i1> %cmp to <8 x i32>
. . .
%cmp1 = trunc <8 x i32> %cmp32 to <8 x i1>
%result = select <8 x i1> %cmp1, . . .

Into:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = zext <8 x i1> %cmp to <8 x i32>   # note: zext
. . .
%cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer
%result = select <8 x i1> %cmp1, …

Which in turn isn't matched well by the LLVM code generators, which
in turn leads to fairly inefficient code.  (i.e. it doesn't just emit
a vector compare and blend instruction.)

Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better
describe its functionality.
2013-07-25 09:46:01 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Dmitry Babokin
fdcec5a219 Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call 2013-06-24 10:08:06 +04:00
Ilia Filippov
d92f9df17c changes in function LLVMFlattenInsertChain 2013-06-14 15:21:45 +04:00
Dmitry Babokin
eb2e5f378c Comment fixes 2013-04-18 15:36:35 +04:00
Dmitry Babokin
cb650d6100 One more opportunity to do better broadcast 2013-04-17 20:56:32 +04:00
Dmitry Babokin
5898532605 Broadcast implementation as InsertElement+Shuffle and related improvements 2013-04-10 02:18:24 +04:00
james.brodman
0a3822f2e5 Fix to make sure we're generating 32-bit gather/scatter when force32bitaddressing is set. 2013-04-05 16:21:05 -04:00
Dmitry Babokin
0af2a13349 DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout 2013-03-23 14:38:51 +04:00
Dmitry Babokin
0f86255279 Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets). 2013-03-23 14:28:05 +04:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
james.brodman
3aaf2ef2d4 ToT Fixes / M4 macro fix 2013-01-14 14:55:10 -05:00
Matt Pharr
0bf1320a32 Remove support for building with LLVM 3.0 2013-01-06 12:27:53 -08:00
Matt Pharr
81dbd504aa Small fixes to eliminate compiler warnings when using clang 2013-01-06 12:10:54 -08:00
Matt Pharr
63dd7d9859 Fix build to work with LLVM top-of-tree again 2013-01-06 12:02:08 -08:00
ptu1
810784da1f Set the ScalarReplAggregate maximum structure size based on target vector width. 2012-11-13 12:35:45 -08:00
Matt Pharr
172a189c6f Fix build with LLVM top-of-tree 2012-10-17 11:11:50 -07:00
Matt Pharr
be2108260e Add --opt=force-aligned-memory option.
This forces all vector loads/stores to be done assuming that the given
pointer is aligned to the vector size, thus allowing the use of sometimes
more-efficient instructions.  (If it isn't the case that the memory is
aligned, the program will fail!).
2012-09-14 13:49:45 -07:00
Matt Pharr
daf5aa8e8b Run inst combine before memory optimizations.
We were previously emitting 64-bit indexing for some gathers where
32-bit was actually fine, due to some adds of constant vectors
that hadn't been simplified to the result.
2012-07-13 12:14:53 -07:00
Matt Pharr
10b79fb41b Add support for non-factored variants of gather/scatter functions.
We now have two ways of approaching gather/scatters with a common base
pointer and with offset vectors.  For targets with native gather/scatter,
we just turn those into base + {1/2/4/8}*offsets.  For targets without,
we turn those into base + {1/2/4/8}*varying_offsets + const_offsets,
where const_offsets is a compile-time constant.

Infrastructure for issue #325.
2012-07-11 14:29:42 -07:00
Matt Pharr
ec0280be11 Rename gather/scatter_base_offsets functions to *factored_based_offsets*.
No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
2012-07-11 14:16:39 -07:00
Matt Pharr
f38770bf2a Fix build with LLVM ToT 2012-06-28 07:36:10 -07:00
Matt Pharr
ada66b5313 Make more attempts to pull out constant offsets for gather/scatter.
The "base+offsets" variants of gather decompose the integer offsets into
compile-time constant and compile-time unknown elements.  (The coalescing
optimization, then, depends on this decomposition being done well--having
as much as possible in the constant component.)  We now make multiple
efforts to improve this decomposition as we run optimization passes; in
some cases we're able to move more over to the constant side than was
first possible.

This in particular fixes issue #276, a case where coalescing was expected
but didn't actually happen.
2012-06-12 16:21:14 -07:00
Matt Pharr
96450e17a3 Do all memory op improvements in a single optimization pass.
Rather than having separate passes to do conversion, when possible, of:

- General gather/scatter of a vector of pointers to g/s of
  a base pointer and integer offsets
- Gather/scatter to masked load/store, load+broadcast
- Masked load/store to regular load/store

Now all are done in a single ImproveMemoryOps pass.  This change was in
particular to address some phase ordering issues that showed up with
multidimensional array access wherein after determining that an outer
dimension had the same index value, we previously weren't able to take
advantage of the uniformity of the resulting pointer.
2012-06-12 13:56:17 -07:00
Matt Pharr
d6c6f95373 Do all replacements of __pseudo* memory ops in a single optimization pass.
Collected the old PseudoGSToGSPass and PseudoMaskedStorePass into a single
pass, ReplacePseudoMemoryOpsPass, which handles both of their tasks.
2012-06-12 13:10:03 -07:00
Matt Pharr
19b46be20d Remove load_and_broadcast from built-ins.
Now that we never ever run with the mask all off, we no longer need
that logic in a built-in function so that we can check the mask.  In
the one place where it was used (turning gathers to the same location
into a load and broadcast), we now just emit the code for that
directly.
2012-06-12 12:30:57 -07:00
Matt Pharr
28a821df7d Improve wording of gather/scatter performance warnings. 2012-06-08 13:32:57 -07:00
Matt Pharr
89a2566e01 Add separate variants of memory built-ins for floats and doubles.
Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then
use the i32 variant of masked_load/masked_store/gather/scatter.  Now, we have
separate float/double variants of each of those.
2012-06-07 14:47:16 -07:00
Matt Pharr
1ac3e03171 Gather/scatter function improvements in builtins.
More naming consistency: _i32 rather than i32, now.

Also improved the m4 macros to generate these sequences to not require as
many parameters.
2012-06-07 14:19:23 -07:00
Matt Pharr
b86d40091a Improve naming of masked load/store instructions in builtins.
Now, use _i32 suffixes, rather than _32, etc.  Also cleaned up the m4
macro to generate these functions, using WIDTH to get the target width,
etc.
2012-06-07 13:58:31 -07:00
Matt Pharr
91d22d150f Update load_and_broadcast built-in
Change function suffix to "_i32", etc, from "_32"

Improve load_and_broadcast macro in util.m4 to grab vector width from 
WIDTH variable rather than taking it as a parameter.
2012-06-07 13:33:17 -07:00
Matt Pharr
90db01d038 Represent MOVMSK'ed masks with int64s rather than int32s.
This allows us to scale up to 64-wide execution.
2012-05-25 11:57:23 -07:00
Nipunn Koorapati
041ade66d5 Placated compiler by initializing variable 2012-05-06 06:59:17 -07:00
Matt Pharr
55c754750e Remove a number of redundant/unneeded optimization passes.
Performance and code quality of performance suite is unchanged,
compilation times are improved by another 20% or so for simple
programs (e.g. rt.ispc).  One very complex programs compiles
about 2.4x faster now.
2012-05-05 15:47:24 -07:00
Matt Pharr
72b6c12856 Notify LLVM pass mgr that the MakeInternalFuncsStaticPass doesn't change the CFG. 2012-05-05 15:47:24 -07:00
Matt Pharr
ee7e367981 Do global dead code elimination early in optimization.
This gives a 15-20% speedup in compilation time for simple
programs (but only ~2% for the big 21k monster program).
2012-05-05 15:47:19 -07:00
Matt Pharr
d99bd279e8 Add generic-32 target. 2012-05-03 11:11:06 -07:00
Matt Pharr
ee1fe3aa9f Update build to handle existence of LLVM 3.2 dev branch.
We now compile with LLVM 3.0, 3.1, and 3.2svn.
2012-05-03 08:25:25 -07:00