Add support for coalescing memory accesses from gathers.

There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
This commit is contained in:
Matt Pharr
2012-02-10 12:46:59 -08:00
parent f20a2d2ee9
commit 73bf552cd6
12 changed files with 1307 additions and 5 deletions

4
ispc.h
View File

@@ -339,6 +339,10 @@ struct Opt {
than gathers/scatters. This is likely only useful for measuring
the impact of this optimization. */
bool disableUniformMemoryOptimizations;
/** Disables optimizations that coalesce incoherent scalar memory
access from gathers into wider vector operations, when possible. */
bool disableCoalescing;
};
/** @brief This structure collects together a number of global variables.