Commit Graph

123 Commits

Author SHA1 Message Date
Matt Pharr
d6c6f95373 Do all replacements of __pseudo* memory ops in a single optimization pass.
Collected the old PseudoGSToGSPass and PseudoMaskedStorePass into a single
pass, ReplacePseudoMemoryOpsPass, which handles both of their tasks.
2012-06-12 13:10:03 -07:00
Matt Pharr
19b46be20d Remove load_and_broadcast from built-ins.
Now that we never ever run with the mask all off, we no longer need
that logic in a built-in function so that we can check the mask.  In
the one place where it was used (turning gathers to the same location
into a load and broadcast), we now just emit the code for that
directly.
2012-06-12 12:30:57 -07:00
Matt Pharr
28a821df7d Improve wording of gather/scatter performance warnings. 2012-06-08 13:32:57 -07:00
Matt Pharr
89a2566e01 Add separate variants of memory built-ins for floats and doubles.
Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then
use the i32 variant of masked_load/masked_store/gather/scatter.  Now, we have
separate float/double variants of each of those.
2012-06-07 14:47:16 -07:00
Matt Pharr
1ac3e03171 Gather/scatter function improvements in builtins.
More naming consistency: _i32 rather than i32, now.

Also improved the m4 macros to generate these sequences to not require as
many parameters.
2012-06-07 14:19:23 -07:00
Matt Pharr
b86d40091a Improve naming of masked load/store instructions in builtins.
Now, use _i32 suffixes, rather than _32, etc.  Also cleaned up the m4
macro to generate these functions, using WIDTH to get the target width,
etc.
2012-06-07 13:58:31 -07:00
Matt Pharr
91d22d150f Update load_and_broadcast built-in
Change function suffix to "_i32", etc, from "_32"

Improve load_and_broadcast macro in util.m4 to grab vector width from 
WIDTH variable rather than taking it as a parameter.
2012-06-07 13:33:17 -07:00
Matt Pharr
90db01d038 Represent MOVMSK'ed masks with int64s rather than int32s.
This allows us to scale up to 64-wide execution.
2012-05-25 11:57:23 -07:00
Nipunn Koorapati
041ade66d5 Placated compiler by initializing variable 2012-05-06 06:59:17 -07:00
Matt Pharr
55c754750e Remove a number of redundant/unneeded optimization passes.
Performance and code quality of performance suite is unchanged,
compilation times are improved by another 20% or so for simple
programs (e.g. rt.ispc).  One very complex programs compiles
about 2.4x faster now.
2012-05-05 15:47:24 -07:00
Matt Pharr
72b6c12856 Notify LLVM pass mgr that the MakeInternalFuncsStaticPass doesn't change the CFG. 2012-05-05 15:47:24 -07:00
Matt Pharr
ee7e367981 Do global dead code elimination early in optimization.
This gives a 15-20% speedup in compilation time for simple
programs (but only ~2% for the big 21k monster program).
2012-05-05 15:47:19 -07:00
Matt Pharr
d99bd279e8 Add generic-32 target. 2012-05-03 11:11:06 -07:00
Matt Pharr
ee1fe3aa9f Update build to handle existence of LLVM 3.2 dev branch.
We now compile with LLVM 3.0, 3.1, and 3.2svn.
2012-05-03 08:25:25 -07:00
Matt Pharr
12706cd37f Debugging optimization pass updates
Don't run mem2reg with -O0 anymore, but do run the intrinsics opt pass, which
allows some CFG simplification due to the mask being all on, etc.
2012-04-25 08:43:11 -10:00
Matt Pharr
32815e628d Improve naming of llvm Instructions created.
We now try harder to keep the names of instructions related to the
initial names of variables they're derived from and so forth.  This
is useful for making both LLVM IR as well as generated C++ code
easier to correlate back to the original ispc source code.

Issue #244.
2012-04-19 16:36:46 -07:00
Matt Pharr
326c45fa17 Fix bugs in LLVMExtractFirstVectorElement().
When we're manually scalarizing the extraction of the first element
of a vector value, we need to be careful about handling constant values
and about where new instructions are inserted.  The old code was
sloppy about this, which in turn lead to invalid IR in some cases.
For example, the two bugs below were essentially due to generating
an extractelement inst from a zeroinitializer value and then inserting
it in the wrong bblock such that a phi node that used that value was
malformed.

Fixes issues #240 and #229.
2012-04-19 09:45:04 -07:00
Matt Pharr
a2bb899a6b Opt debug printing improvement
Now, just match the prefix of the provided function name of interest,
which allows us to not worry about managing details.
2012-04-19 09:34:54 -07:00
Matt Pharr
9fedb1674e Improve basic block dumping from optimization passes.
Now done via a macro, which is cleaner.  It's also now possible to
specify a single function to watch, which is useful for debugging.
2012-04-18 15:46:18 -07:00
Matt Pharr
7c91b01125 Handle more forms of constant vectors in lGetMask().
Various optimization passes depend on turning a compile-time constant
mask into a bit vector; it turns out that in LLVM3.1, constant vectors
of ints/floats are represented with llvM::ConstantDataVector, but
constant vectors of bools use llvm::ConstantVector (which is what LLVM
3.0 uses for all constant vectors).  Now lGetMask() always does the
llvm::ConstantVector path, to cover this case.

This improves generated C++ code by eliminating things like select
with an all on/off mask, turning movmask calls with constants into
constant values, etc.
2012-04-18 11:39:11 -07:00
Matt Pharr
c202e9e106 Add debugging printing code to optimization passes.
Now all of the passes dump out the basic block before and after
they do their thing when --debug is enabled.
2012-04-18 11:39:10 -07:00
Matt Pharr
645a8c9349 Fix serious bug in VSelMovmskOpt
When the mask was all off, we'd choose the incorrect operand!

(This bug was masked since this optimization wasn't triggering as
intended, due to other issues to be fixed in a forthcoming commit.
2012-04-18 11:39:10 -07:00
Matt Pharr
b9d6ba2aa0 Always set target info, even when compiling to generic targets.
This allows the SROA pass eliminate a lot of allocas and loads and
stores, which helps a lot for performance.
2012-04-17 15:10:30 -07:00
Matt Pharr
fefa86e0cf Remove LLVM_TYPE_CONST #define / usage.
Now with LLVM 3.0 and beyond, types aren't const.
2012-04-15 20:11:27 -07:00
Matt Pharr
098c4910de Remove support for building with LLVM 2.9.
A forthcoming change uses some features of LLVM 3.0's new type
system, and it's not worth back-porting this to also all work
with LLVM 2.9.
2012-04-15 20:08:51 -07:00
Matt Pharr
4f8cf019ca Add pass to verify module before starting optimizations. 2012-04-05 08:49:39 -07:00
Matt Pharr
4c9ac7fcf1 Fix build with LLVM 2.9. 2012-04-05 08:22:40 -07:00
Jean-Luc Duprat
e9626a1d10 Added macro PRId64 to opt.cpp for compilation on Windows 2012-03-30 16:56:30 -07:00
Matt Pharr
7e954e4248 Don't issue gather/scatter warnigns in the 'extra' bits of foreach loops.
With AOS data, we can often coalesce the accesses into gathers for the main
part of foreach loops but only fail on the last bits where the mask is not
all on (since the coalescing code doesn't handle mixed masks, yet.) Before,
we'd report success with coalescing and then also report that gathers were needed
for the same accesses that were coalesced, which was a) confusing, and b)
didn't accurately represent what was going on for the majority of the loop
iterations.
2012-03-19 15:08:35 -07:00
Matt Pharr
57af0eb64f Still do the gather/scatter -> load store pass even if leaving 'pseudo' mem opts unchanged. 2012-03-19 12:04:38 -07:00
Matt Pharr
60aae16752 Move check for linear vector to LLVMVectorIsLinear() function. 2012-03-19 11:57:04 -07:00
Matt Pharr
e264d95019 LLVMVectorValuesAllEqual() improvements.
Clean up the API, so the caller doesn't have to pass in a vector so
the function can track PHI nodes (do that internally instead.)

Handle casts in lValuesAreEqual().
2012-03-19 11:54:18 -07:00
Matt Pharr
0664f5a724 Add LLVMExtractVectorInts() function, use it in the opt code. 2012-03-19 11:48:38 -07:00
Matt Pharr
17c6a19527 Add LLVMExtractFirstVectorElement() function (and use it).
For cases where it turns out that we just need the first element of
a vector (e.g. because we've determined that all of the values are
equal), it's often more efficient to only compute that one value
with scalar operations than to compute the whole vector's worth and
then just use one value.  This function tries to rewrite a vector
computation to the scalar equivalent, if possible.

(Partial work-around to http://llvm.org/bugs/show_bug.cgi?id=11775.)

Note that sometimes this is the wrong thing to do--if we need the entire
vector value for other purposes, for example.
2012-03-19 11:48:33 -07:00
Matt Pharr
cbc8b8259b Use LLVMIntAsType() in opt code instead of locally-defined equivalent. 2012-03-19 11:36:00 -07:00
Matt Pharr
1067a2e4be Add LLVMShuffleVectors() and LLVMConcatVectors() functions.
These were local functions in opt.cpp that are now public via the
llvmutil.* files.
2012-03-19 11:34:52 -07:00
Matt Pharr
74a031a759 Small improvements to debug info printing in opt.cpp 2012-03-19 11:32:08 -07:00
Matt Pharr
9ec8e5a275 Fix compile warnings on Linux 2012-03-12 13:12:23 -07:00
Matt Pharr
8fdf84de04 Disable debugging printing code. 2012-03-05 09:58:09 -08:00
Matt Pharr
e013e0a374 Handle extract instructions in the lGetBasePtrAndOffsets() pattern matching code. 2012-03-05 09:58:09 -08:00
Matt Pharr
f7937f1e4b Fix build with LLVM2.9/3.0 2012-03-03 10:30:56 -08:00
Matt Pharr
95224f3f11 Improve detection of cases where 32-bit gather/scatter can be used.
Previously, we weren't noticing that an <n x i64> zero vector could
be represented as an <n x i32> without error.
2012-02-21 12:13:25 -08:00
Matt Pharr
a86b942730 Fix cases in coalesce opt where offsets would be truncated to 32 bits 2012-02-14 10:05:07 -08:00
Matt Pharr
cc86e4a7d2 Disable coalescing optimizations when using generic target.
The main issue is that they end up generating a number of smaller
vector ops (e.g. 4-wide and 8-wide on the 16-wide generic target,
which the examples/intrinsics implementations don't currently
support.

This fixes a number of failing tests for now; it may be worth
generalizing the stuff in examples/intrinsics at some point,
since as a general principle, e.g. if generating LLVM IR output,
the coalescing optimizations are still desirable.

Issue #175.
2012-02-13 16:52:01 -08:00
Matt Pharr
e864447e4a Fix silly bug in vector scale extraction optimization.
(Introduced in f20a2d2ee.  How did this ever pass tests?)
2012-02-13 12:06:45 -08:00
Matt Pharr
73bf552cd6 Add support for coalescing memory accesses from gathers.
There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
2012-02-10 13:10:39 -08:00
Matt Pharr
f20a2d2ee9 Generalize code to extract scales by 2/4/8 from addressing calculations.
Now, if we have a scale by 16, say, we extract out the scalar scale
of 8 and leave an explicit scale by 2.
2012-02-10 12:35:44 -08:00
Matt Pharr
0c25bc063c Add lGEPInst() utility routine to opt.cpp.
Deal with the messiness of LLVM API changes when creating
these in a single place.
2012-02-10 12:32:15 -08:00
Matt Pharr
5b4673e8eb Fix build with LLVM 2.9. 2012-02-07 08:37:13 -08:00
Matt Pharr
0432f97555 Fix build with LLVM 3.1 TOT 2012-01-31 14:10:07 -08:00