AVX: go back to using blend (vs. masked store) when possible.

All of the masked store calls were inhibiting putting values into
registers, which in turn led to a lot of unnecessary stack traffic.
This approach seems to give better code in the end.
This commit is contained in:
Matt Pharr
2011-09-07 11:26:49 -07:00
parent 375f1cb8e8
commit c86128e8ee
4 changed files with 195 additions and 54 deletions

14
opt.cpp
View File

@@ -1433,16 +1433,12 @@ LowerMaskedStorePass::runOnBasicBlock(llvm::BasicBlock &bb) {
llvm::Value *rvalue = callInst->getArgOperand(1);
llvm::Value *mask = callInst->getArgOperand(2);
// On SSE, we need to choose between doing the load + blend + store
// trick, or serializing the masked store. On targets with a
// native masked store instruction, the implementations of
// __masked_store_blend_* should be the same as __masked_store_*,
// so this doesn't matter. On SSE, blending is generally more
// efficient and is always safe to do on stack-allocated values.(?)
bool doBlend = (g->target.isa != Target::AVX &&
// We need to choose between doing the load + blend + store trick,
// or serializing the masked store. Even on targets with a native
// masked store instruction, this is preferable since it lets us
// keep values in registers rather than going out to the stack.
bool doBlend = (!g->opt.disableBlendedMaskedStores ||
lIsStackVariablePointer(lvalue));
if (g->target.isa == Target::SSE4 || g->target.isa == Target::SSE2)
doBlend |= !g->opt.disableBlendedMaskedStores;
// Generate the call to the appropriate masked store function and
// replace the __pseudo_* one with it.