When replacing 'all on' masked store with regular store, set alignment
to be the vector element alignment, not the alignment for a whole vector.
(i.e. 4 or 8 byte alignment, not 32 or 64).
Emit calls to masked_store, not masked_store_blend, when handling
masked stores emitted by the frontend.
Fix bug in binary8to16 macro in builtins.m4
Fix bug in 16-wide version of __reduce_add_float
Remove blend function implementations for masked_store_blend for
AVX; just forward those on to the corresponding real masked store
functions.
Add optimization patterns to detect and simplify masked loads and stores
with the mask all on / all off.
Enable AVX for LLVM 3.0 builds (still generally hits bugs / unimplemented
stuff on the LLVM side, but it's getting there).
were expecting vector-width-aligned pointers where in point of fact,
there's no guarantee that they would have been in general.
Removed the aligned memory allocation routines from some of the examples;
they're no longer needed.
No perf. difference on Core2/Core i5 CPUs; older CPUs may see some
regressions.
Still need to update the documentation for this change and finish reviewing
alignment issues in Load/Store instructions generated by .cpp files.