Commit Graph

22 Commits

Author SHA1 Message Date
Matt Pharr
2405dae8e6 Use malloc() to get space for task arguments when compiling to AVX.
This is to work around the LLVM bug/limitation discused in LLVM bug
10841 (http://llvm.org/bugs/show_bug.cgi?id=10841).
2011-09-17 13:38:51 -07:00
Matt Pharr
3607f3e045 Remove support for building with LLVM 2.8. Fixes issue #66.
Both 2.9 and top-of-tree generate substantially better code than
LLVM 2.8 did, so it's not worth fixing the 2.8 build.
2011-09-17 13:18:59 -07:00
Matt Pharr
e2a88d491f Mark the internal __fast_masked_vload function as static 2011-09-13 15:43:48 -07:00
Matt Pharr
30f9dcd4f5 Unroll loops by default, add --opt=disable-loop-unroll to disable.
Issue #78.
2011-09-13 15:37:18 -07:00
Matt Pharr
dd153d3c5c Handle more instruction types when flattening offset vectors.
Generalize the lScalarizeVector() utility routine (used in determining
when we can change gathers/scatters into vector loads/stores, respectively)
to handle vector shuffles and vector loads.  This fixes issue #79, which
provided a case where a gather was being performed even though a vector
load was possible.
2011-09-13 09:43:56 -07:00
Matt Pharr
6375ed9224 AVX: Fix bug with misdeclaration of blend intrinsic.
This was preventing the "convert an all-on blend to one of the
  operand values" optimization from kicking on in AVX.
2011-09-12 06:42:38 -07:00
Matt Pharr
785d8a29d3 Run mem2reg pass even when doing -O0 compiles 2011-09-09 09:24:43 -07:00
Matt Pharr
c86128e8ee AVX: go back to using blend (vs. masked store) when possible.
All of the masked store calls were inhibiting putting values into
registers, which in turn led to a lot of unnecessary stack traffic.
This approach seems to give better code in the end.
2011-09-07 11:26:49 -07:00
Matt Pharr
eb7913f1dd AVX: fix alignment when changing masked load to regular load.
Also added some debugging/tracing stuff (commented out).
Commented out iffy assert that was hitting for avx stuff.
2011-09-01 15:45:49 -07:00
Matt Pharr
f65a20c700 AVX bugfix: when replacing 'all on' masked store with a store,
the rvalue is operand 2, not operand 1 (which is the mask!)
2011-08-31 18:06:29 -07:00
Matt Pharr
ad9e66650d AVX bugfix with alignment for store instructions.
When replacing 'all on' masked store with regular store, set alignment 
to be the vector element alignment, not the alignment for a whole vector. 
(i.e. 4 or 8 byte alignment, not 32 or 64).
2011-08-29 16:58:48 -07:00
Matt Pharr
4ab982bc16 Various AVX fixes (found by inspection).
Emit calls to masked_store, not masked_store_blend, when handling
  masked stores emitted by the frontend.
Fix bug in binary8to16 macro in builtins.m4
Fix bug in 16-wide version of __reduce_add_float
Remove blend function implementations for masked_store_blend for
  AVX; just forward those on to the corresponding real masked store
  functions.
2011-08-26 12:58:02 -07:00
Matt Pharr
fe54f1ad8e Fixes to build with latest LLVM ToT 2011-08-18 08:34:49 +01:00
Matt Pharr
922dbdec06 Fixes to build with LLVM top-of-tree 2011-07-26 10:57:49 +01:00
Matt Pharr
16be1d313e AVX updates / improvements.
Add optimization patterns to detect and simplify masked loads and stores
  with the mask all on / all off.
Enable AVX for LLVM 3.0 builds (still generally hits bugs / unimplemented
  stuff on the LLVM side, but it's getting there).
2011-07-25 07:41:37 +01:00
Matt Pharr
96d40327d0 Fix issue #72: 64 gathers/scatters led to undefined symbols 2011-07-21 14:44:55 +01:00
Matt Pharr
bba7211654 Add support for int8/int16 types. Addresses issues #9 and #42. 2011-07-21 06:57:40 +01:00
Matt Pharr
654cfb4b4b Many fixes for recent LLVM dev tree API changes 2011-07-18 15:54:39 +01:00
Matt Pharr
fe7717ab67 Added shuffle() variant to the standard library that takes two
varying values and a permutation index that spans the concatenation
of the two of them (along the lines of SHUFPS...)
2011-07-02 08:43:35 +01:00
Matt Pharr
2709c354d7 Add support for broadcast(), rotate(), and shuffle() stdlib routines 2011-06-27 17:31:44 -07:00
Matt Pharr
b84167dddd Fixed a number of issues related to memory alignment; a number of places
were expecting vector-width-aligned pointers where in point of fact,
there's no guarantee that they would have been in general.

Removed the aligned memory allocation routines from some of the examples;
they're no longer needed.

No perf. difference on Core2/Core i5 CPUs; older CPUs may see some
regressions.

Still need to update the documentation for this change and finish reviewing
alignment issues in Load/Store instructions generated by .cpp files.
2011-06-23 18:18:33 -07:00
Matt Pharr
18af5226ba Initial commit. 2011-06-21 12:48:50 -07:00