Commit Graph

28 Commits

Author SHA1 Message Date
james.brodman
3c18c7a713 Fixed compile error: == instead of = 2012-10-26 16:52:54 -04:00
Matt Pharr
406fbab40e Fix bugs in declarations of __any, __all, and __none in examples/intrinsics.
They return bool, not vector of bool.
2012-10-17 10:55:50 -07:00
Jean-Luc Duprat
f0b0618484 Added the following mask tests: __any(), __all(), __none() for all supported targets.
This allows for more efficient code generation of KNC.
2012-09-14 11:06:18 -07:00
Jean-Luc Duprat
aecd6e0878 All the smear(), setzero() and undef() APIs are now templated on the return type.
Modified ISPC's internal mangling to pass these through unchanged.
Tried hard to make sure this is not going to introduce an ABI change.
2012-07-17 17:06:36 -07:00
Matt Pharr
216ac4b1a4 Stop factoring out constant offsets for gather/scatter if instr is available.
For KNC (gather/scatter), it's not helpful to factor base+offsets gathers
and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets.
Now, if a HW instruction is available for gather/scatter, we just factor
into base + {1/2/4/8} * offsets (if possible).  Not only is this simpler,
but it's also what we need to pass a value along to the scale by
2/4/8 available directly in those instructions.

Finishes issue #325.
2012-07-11 14:52:29 -07:00
Matt Pharr
ec0280be11 Rename gather/scatter_base_offsets functions to *factored_based_offsets*.
No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
2012-07-11 14:16:39 -07:00
Jean-Luc Duprat
bea88ab122 Integrated changes from mmp/and-fold-opt:
Add peephole optimization to eliminate some mask AND operations.

On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.

This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "and(equalfloat(a, b), c)" is changed to
"_equal_float_and_mask(a, b, c)", saving an instruction in the end.

Issue #319.

Merge commit '8ef6bc16364d4c08aa5972141748110160613087'

Conflicts:
	examples/intrinsics/knc.h
	examples/intrinsics/sse4.h
2012-07-10 10:33:24 -07:00
Matt Pharr
bc7775aef2 Fix __ordered and _unordered floating point functions for C++ target.
Fixes include adding "_float" and "_double" suffixes as appropriate as well
as providing a number of missing implementations.

This fixes a number of failures in the half* tests.
2012-07-09 14:35:51 -07:00
Jean-Luc Duprat
516ba85abd Merge pull request #322 from mmp/vector-constants
Vector constants
2012-07-09 09:28:26 -07:00
Jean-Luc Duprat
098277b4f0 Merge pull request #321 from mmp/setzero
More varied support for constant vectors from C++ backend.
2012-07-09 08:57:05 -07:00
Matt Pharr
8ef6bc1636 Add peephole optimization to eliminate some mask AND operations.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.

This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "__and(__equal_float(a, b), c)" is changed to
"__equal_float_and_mask(a, b, c)", saving an instruction in the end.

Issue #319.
2012-07-07 08:35:38 -07:00
Matt Pharr
974b40c8af Add type suffix to comparison ops in C++ output.
e.g. "__equal()" -> "__equal_float()", etc.

No functional change; this is necessary groundwork for a forthcoming
peephole optimization that eliminates ANDs of masks in some cases.
2012-07-07 07:50:59 -07:00
Matt Pharr
e5fe0eabdc Update __load() builtins to take const pointers. 2012-07-06 08:47:47 -07:00
Matt Pharr
0d3993fa25 More varied support for constant vectors from C++ backend.
If we have a vector of all zeros, a __setzero_* function call is emitted,
permitting calling specialized intrinsics for this.  Undefined values
are reflected with an __undef_* call, which similarly allows passing that
information along.

This change also includes a cleanup to the signature of the __smear_*
functions; since they already have different names depending on the
scalar value type, we don't need to use the trick of passing an
undefined value of the return vector type as the first parameter as
an indirect way to overload by return value.

Issue #317.
2012-07-05 20:19:11 -07:00
Jean-Luc Duprat
e431b07e04 Changed the C API to use templates to indicate memory alignment to the C compiler
This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)

Updated generic-32.h and generic-64.h to the new memory API
2012-06-28 09:29:15 -07:00
Matt Pharr
27e39954d6 Fix a number of issues in examples/intrinsics/sse4.h.
This had gotten fairly out of date, after recent changes to C++ output.
Roughly 15 tests still fail with this target.

Issue #278.
2012-06-08 12:52:36 -07:00
Matt Pharr
89a2566e01 Add separate variants of memory built-ins for floats and doubles.
Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then
use the i32 variant of masked_load/masked_store/gather/scatter.  Now, we have
separate float/double variants of each of those.
2012-06-07 14:47:16 -07:00
Matt Pharr
b86d40091a Improve naming of masked load/store instructions in builtins.
Now, use _i32 suffixes, rather than _32, etc.  Also cleaned up the m4
macro to generate these functions, using WIDTH to get the target width,
etc.
2012-06-07 13:58:31 -07:00
Matt Pharr
90db01d038 Represent MOVMSK'ed masks with int64s rather than int32s.
This allows us to scale up to 64-wide execution.
2012-05-25 11:57:23 -07:00
Matt Pharr
c6241581a0 Add an extra parameter to __smear functions to encode return type.
Now, the __smear* functions in generated C++ code have an unused first
parameter of the desired return type; this allows us to have headers
that include variants of __smear for multiple target widths.  (This
approach is necessary since we can't overload by return type in C++.)

Issue #256.
2012-05-08 09:54:23 -07:00
Matt Pharr
12c754c92b Improved handling of splatted constant vectors in C++ backend.
Now, when we're printing out a constant vector value, we check to see
if it's a splat and call out to one of the __splat_* functions in
the generated code if to.
2012-04-19 13:11:15 -07:00
Matt Pharr
bba02f87ea Improve implementations of unsigned <=, >= in sse4 intrinsics file. 2012-01-27 16:49:41 -08:00
Matt Pharr
a5b7fca7e0 Extract constant offsets from gather/scatter base+offsets offset vectors.
When we're able to turn a general gather/scatter into the "base + offsets"
form, we now try to extract out any constant components of the offsets and
then pass them as a separate parameter to the gather/scatter function
implementation.

We then in turn carefully emit code for the addressing calculation so that
these constant offsets match LLVM's patterns to detect this case, such that
we get the constant offsets directly encoded in the instruction's addressing
calculation in many cases, saving arithmetic instructions to do these
calculations.

Improves performance of stencil by ~15%.  Other workloads unchanged.
2012-01-24 14:41:15 -08:00
Matt Pharr
68f6ea8def For << and >> with C++, detect when all instances are shifting by the same amount.
In this case, we now emit calls to potentially-specialized functions for the
left/right shifts that take a single integer value for the shift amount.  These
in turn can be matched to the corresponding intrinsics for the SSE target.

Issue #145.
2012-01-19 10:04:32 -07:00
Matt Pharr
d14a2de168 Fix generic code emission when building with LLVM3.0/2.9.
Specifically, don't use vector select for masked store blend there,
but emit a call to a undefined __masked_store_blend_*() functions.

Added implementations of these functions to the sse4.h and generic-16.h
in examples/instrinsics.  (Calls to these will never be generated with
LLVM 3.1).
2012-01-17 23:42:22 -07:00
Matt Pharr
c6d1cebad4 Update masked_load/store implementations for generic targets to take void *s
(Fixes compile errors when we try to actually use these!)
2012-01-17 23:42:22 -07:00
Matt Pharr
78c6d3c02f Add initial support for 'goto' statements.
ispc now supports goto, but only under uniform control flow--i.e.
it must be possible for the compiler to statically determine that
all program instances will follow the goto.  An error is issued at
compile time if a goto is used when this is not the case.
2012-01-05 12:22:36 -08:00
Matt Pharr
8938e14442 Add support for emitting ~generic vectorized C++ code.
The compiler now supports an --emit-c++ option, which generates generic
vector C++ code.  To actually compile this code, the user must provide
C++ code that implements a variety of types and operations (e.g. adding
two floating-point vector values together, comparing them, etc).

There are two examples of this required code in examples/intrinsics:
generic-16.h is a "generic" 16-wide implementation that does all required
with scalar math; it's useful for demonstrating the requirements of the
implementation.  Then, sse4.h shows a simple implementation of a SSE4
target that maps the emitted function calls to SSE intrinsics.

When using these example implementations with the ispc test suite,
all but one or two tests pass with gcc and clang on Linux and OSX.
There are currently ~10 failures with icc on Linux, and ~50 failures with
MSVC 2010.  (To be fixed in coming days.)

Performance varies: when running the examples through the sse4.h
target, some have the same performance as when compiled with --target=sse4
from ispc directly (options), while noise is 12% slower, rt is 26%
slower, and aobench is 2.2x slower.  The details of this haven't yet been
carefully investigated, but will be in coming days as well.

Issue #92.
2012-01-04 12:59:03 -08:00