Commit Graph

22 Commits

Author SHA1 Message Date
Matt Pharr
89a2566e01 Add separate variants of memory built-ins for floats and doubles.
Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then
use the i32 variant of masked_load/masked_store/gather/scatter.  Now, we have
separate float/double variants of each of those.
2012-06-07 14:47:16 -07:00
Matt Pharr
1ac3e03171 Gather/scatter function improvements in builtins.
More naming consistency: _i32 rather than i32, now.

Also improved the m4 macros to generate these sequences to not require as
many parameters.
2012-06-07 14:19:23 -07:00
Matt Pharr
b86d40091a Improve naming of masked load/store instructions in builtins.
Now, use _i32 suffixes, rather than _32, etc.  Also cleaned up the m4
macro to generate these functions, using WIDTH to get the target width,
etc.
2012-06-07 13:58:31 -07:00
Matt Pharr
91d22d150f Update load_and_broadcast built-in
Change function suffix to "_i32", etc, from "_32"

Improve load_and_broadcast macro in util.m4 to grab vector width from 
WIDTH variable rather than taking it as a parameter.
2012-06-07 13:33:17 -07:00
Matt Pharr
1d29991268 Indentation fixes in builtins/ 2012-06-07 13:23:07 -07:00
Matt Pharr
90db01d038 Represent MOVMSK'ed masks with int64s rather than int32s.
This allows us to scale up to 64-wide execution.
2012-05-25 11:57:23 -07:00
Matt Pharr
ee7e367981 Do global dead code elimination early in optimization.
This gives a 15-20% speedup in compilation time for simple
programs (but only ~2% for the big 21k monster program).
2012-05-05 15:47:19 -07:00
Matt Pharr
0c1b206185 Pass log/exp/pow transcendentals through to targets that support them.
Currently, this is the generic targets.
2012-05-03 13:49:56 -07:00
Matt Pharr
fd846fbe77 Fix bug in __gather_base_offsets_32.
In short, we weren't correctly zeroing the compile-time constant portion
of the offsets for lanes that aren't executing. (!)

Fixes issue #235.
2012-04-12 10:28:15 -07:00
Matt Pharr
3b95452481 Add memcpy(), memmove() and memset() to the standard library.
Issue #183.
2012-03-05 16:09:00 -08:00
Matt Pharr
c152ae3c32 Add single-precision asin() and acos() to stdlib.
Issue #184.
2012-03-05 13:32:13 -08:00
Matt Pharr
5b4673e8eb Fix build with LLVM 2.9. 2012-02-07 08:37:13 -08:00
Matt Pharr
664dc3bdda Add support for "new" and "delete" to the language.
Issue #139.
2012-01-27 14:47:06 -08:00
Matt Pharr
24f58fa16a Update per_lane macro to not use ID for lane number in macro expansion
This was leading to unintended consequences if WIDTH was used in macro code,
which was undesirable.
2012-01-27 09:12:13 -08:00
Matt Pharr
a5b7fca7e0 Extract constant offsets from gather/scatter base+offsets offset vectors.
When we're able to turn a general gather/scatter into the "base + offsets"
form, we now try to extract out any constant components of the offsets and
then pass them as a separate parameter to the gather/scatter function
implementation.

We then in turn carefully emit code for the addressing calculation so that
these constant offsets match LLVM's patterns to detect this case, such that
we get the constant offsets directly encoded in the instruction's addressing
calculation in many cases, saving arithmetic instructions to do these
calculations.

Improves performance of stencil by ~15%.  Other workloads unchanged.
2012-01-24 14:41:15 -08:00
Matt Pharr
d805e8b183 Add clock() function to standard library.
Also corrected the declaration of num_cores() to return a
uniform value.
2012-01-22 13:05:27 -08:00
Matt Pharr
1bba9d4307 Improve atomic_swap_global() to take advantage of associativity.
We now do a single atomic hardware swap and then effectively do 
swaps between the running program instances such that the result
is the same as if they had happened to run a particular ordering
of hardware swaps themselves.

Also cleaned up __atomic_swap_uniform_* built-in implementations
to not take the mask, which they weren't using anyway.

Finishes Issue #56.
2012-01-20 10:37:33 -08:00
Matt Pharr
234e5cd3e1 Use vector select for masked store blend if building with LLVM3.1 2012-01-04 12:59:03 -08:00
Matt Pharr
f75c94a8f1 Have aos/soa and broadcast/shuffle/rotate functions provided by the target.
The SSE/AVX targets use the old versions from util.m4, but these functions are
now passed through to the generic targets.
2012-01-04 12:59:03 -08:00
Matt Pharr
848a432640 Fix various small things that were broken with single-bit-per-lane masks.
Also small cleanups to declarations, "no captures" added, etc.
2012-01-04 12:59:03 -08:00
Matt Pharr
562d61caff Added masked load optimization pass.
This pass handles the "all on" and "all off" mask cases appropriately.

Also renamed load_masked stuff in built-ins to masked_load for consistency with
masked_store.
2012-01-04 11:51:26 -08:00
Matt Pharr
1d9201fe3d Add "generic" 4, 8, and 16-wide targets.
When used, these targets end up with calls to undefined functions for all
of the various special vector stuff ispc needs to compile ispc programs
(masked store, gather, min/max, sqrt, etc.).

These targets are not yet useful for anything, but are a step toward
having an option to C++ code with calls out to intrinsics.

Reorganized the directory structure a bit and put the LLVM bitcode used
to define target-specific stuff (as well as some generic built-ins stuff)
into a builtins/ directory.

Note that for building on Windows, it's now necessary to set a LLVM_VERSION
environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)
2011-12-19 13:46:50 -08:00