Commit Graph

96 Commits

Author SHA1 Message Date
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
068fd8098c Explicitly set armv7-eabi target triple on ARM.
This lets the compiler generate FMA instructions, which seems
desirable.
2013-07-20 11:19:10 -07:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
2267f278d2 Fix for #503 - avoid omitting frame pointer on Win32 2013-06-04 14:51:36 +04:00
Dmitry Babokin
1a7ac8b804 Enable memory alignment management via compiler options 2013-05-24 10:29:01 +04:00
Dmitry Babokin
b6b9daa3c5 Enabling llvm 3.4 2013-05-13 19:25:31 +04:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
a0462fe1ee #469: Fix for multi-target compilation 2013-04-12 14:06:12 +04:00
Dmitry Babokin
0af2a13349 DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout 2013-03-23 14:38:51 +04:00
Dmitry Babokin
7f0c92eb4d Fix for #431: memory leak due to multiple TargetMachine creation 2013-03-23 14:33:45 +04:00
Dmitry Babokin
0f86255279 Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets). 2013-03-23 14:28:05 +04:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
Dmitry Babokin
524939dc5b Fix for issue #430 2013-02-27 18:03:07 +04:00
Matt Pharr
0bf1320a32 Remove support for building with LLVM 3.0 2013-01-06 12:27:53 -08:00
Matt Pharr
63dd7d9859 Fix build to work with LLVM top-of-tree again 2013-01-06 12:02:08 -08:00
Matt Pharr
8cbfde6092 Small fixes to build with LLVM top-of-tree (now numbered as version 3.3) 2012-12-02 14:29:24 -08:00
Matt Pharr
172a189c6f Fix build with LLVM top-of-tree 2012-10-17 11:11:50 -07:00
Matt Pharr
be2108260e Add --opt=force-aligned-memory option.
This forces all vector loads/stores to be done assuming that the given
pointer is aligned to the vector size, thus allowing the use of sometimes
more-efficient instructions.  (If it isn't the case that the memory is
aligned, the program will fail!).
2012-09-14 13:49:45 -07:00
Matt Pharr
19d8f2e258 Generate FMA instructions with AVX2 (when possible).
Issue #320.
2012-08-03 10:43:41 -07:00
Matt Pharr
0bb4d282e2 Add sys/types.h include for linux/osx. 2012-07-23 08:32:41 -07:00
Matt Pharr
e9fe9f5043 Add cpu strings for Ivy Bridge and HSW.
Default to avx2 ISA for HSW CPUs.
2012-07-23 08:24:18 -07:00
Matt Pharr
51210a869b Support core-avx-i and core-avx2 CPU types.
(And map them to avx1.1 and avx2 targets, respectively.)
2012-07-19 10:15:59 -07:00
Matt Pharr
6a410fc30e Emit gather instructions for the AVX2 targets.
Issue #308.
2012-07-13 12:29:05 -07:00
Matt Pharr
98b2e0e426 Fixes for intrinsics unsupported in earlier LLVM versions.
Specifically, don't use the half/float conversion routines with
LLVM 3.0, and don't try to use RDRAND with anything before LLVM 3.2.
2012-07-13 12:14:10 -07:00
Matt Pharr
371d4be8ef Fix bugs in detection of Ivy Bridge systems.
We were incorrectly characterizing them as basic AVX1 without further
extensions, due to a bug in the logic to check CPU features.
2012-07-12 14:11:15 -07:00
Matt Pharr
216ac4b1a4 Stop factoring out constant offsets for gather/scatter if instr is available.
For KNC (gather/scatter), it's not helpful to factor base+offsets gathers
and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets.
Now, if a HW instruction is available for gather/scatter, we just factor
into base + {1/2/4/8} * offsets (if possible).  Not only is this simpler,
but it's also what we need to pass a value along to the scale by
2/4/8 available directly in those instructions.

Finishes issue #325.
2012-07-11 14:52:29 -07:00
Matt Pharr
10b79fb41b Add support for non-factored variants of gather/scatter functions.
We now have two ways of approaching gather/scatters with a common base
pointer and with offset vectors.  For targets with native gather/scatter,
we just turn those into base + {1/2/4/8}*offsets.  For targets without,
we turn those into base + {1/2/4/8}*varying_offsets + const_offsets,
where const_offsets is a compile-time constant.

Infrastructure for issue #325.
2012-07-11 14:29:42 -07:00
Matt Pharr
aabbdba068 Switch a few remaining fprintf() calls to use Warning()/Error(). 2012-07-06 12:56:45 -07:00
Matt Pharr
4186ef204d Fix build with LLVM top of tree. 2012-07-05 13:35:01 -07:00
Nicolas Trangez
3a007f939a Build: Include unistd.h where required
Some modules require an include of unistd.h (e.g. for getcwd and isatty
definitions).

These changes were required to build successfully on a Fedora 17 system,
using GCC 4.7.0 & glibc-headers 2.15.
2012-07-04 14:49:00 +02:00
Matt Pharr
f38770bf2a Fix build with LLVM ToT 2012-06-28 07:36:10 -07:00
Matt Pharr
40a295e951 Fix bug where "avx-x2" target would cause AVX1.1 to be used. 2012-06-12 13:37:38 -07:00
Matt Pharr
6c7df4cb6b Add initial support for "avx1.1" targets for Ivy Bridge.
So far, only the use of the float/half conversion instructions distinguishes
this from the "avx1" target.

Partial work on issue #263.
2012-06-08 15:55:00 -07:00
Matt Pharr
1397dbdabc Don't generate colorized output escapes when stderr isn't a TTY.
When piping to a pile, more/less, etc, this is generally undesirable.

This behavior can be overridden with the --colorized-output command-line
flag.
2012-06-04 09:20:57 -07:00
Matt Pharr
449d956966 Add support for generic-64 target. 2012-05-25 11:57:28 -07:00
Matt Pharr
72c41f104e Fix various malformed program crashes. 2012-05-18 10:44:45 -07:00
Matt Pharr
299ae186f1 Expect support for half and transcendentals from all generic targets 2012-05-18 06:13:45 -07:00
Matt Pharr
fbed0ac56b Remove allOffMaskIsSafe from Target
The intent of this was to indicate whether it was safe to run code
with an 'all of' mask on the given target (and then sometimes be
more flexible about e.g. running both true and false blocks of if
statements, etc.)

The problem is that even if the architecture has full native mask support,
it's still not safe to run 'uniform' memory operations with the mask all
off.  Even more tricky, we sometimes transform masked varying memory operations
to uniform ones during optimization (e.g. gather->load and broadcast).

This fixes a number of the tests/switch-* tests that were failing on the
generic targets due to this issue.
2012-05-09 14:18:47 -07:00
Matt Pharr
0c1b206185 Pass log/exp/pow transcendentals through to targets that support them.
Currently, this is the generic targets.
2012-05-03 13:49:56 -07:00
Matt Pharr
d99bd279e8 Add generic-32 target. 2012-05-03 11:11:06 -07:00
Matt Pharr
ee1fe3aa9f Update build to handle existence of LLVM 3.2 dev branch.
We now compile with LLVM 3.0, 3.1, and 3.2svn.
2012-05-03 08:25:25 -07:00
Matt Pharr
d5cc2ad643 Call Verify() methods of various debugging llvm::DI* types after creation. 2012-04-25 08:43:11 -10:00
Matt Pharr
fefa86e0cf Remove LLVM_TYPE_CONST #define / usage.
Now with LLVM 3.0 and beyond, types aren't const.
2012-04-15 20:11:27 -07:00
Matt Pharr
972043c146 Fix serious bug in handling constant-valued initializers.
In InitSymbol(), we try to be smart and emit a memcpy when there
are a number of values to store (e.g. for arrays, structs, etc.)

Unfortunately, this wasn't working as desired for bools (i.e. i1 types),
since the SizeOf() call that tried to figure out how many bytes to
copy would return 0 bytes, due to dividing the number of bits to copy
by 8.

Fixes issue #234.
2012-04-09 14:23:08 -07:00
Matt Pharr
b813452d33 Don't issue a slew of warnings if a bogus cpu type is specified.
Issue #221.
2012-04-03 06:13:28 -07:00
Matt Pharr
560bf5ca09 Updated logic for selecting target ISA when not specified.
Now, if the user specified a CPU then we base the ISA choice on that--only
if no CPU and no target is specified do we use the CPUID-based check to
pick a vector ISA.

Improvement to fix to #205.
2012-03-30 16:36:12 -07:00
Matt Pharr
b3c5043dcc Don't enable llvm's UnsafeFPMath option when --opt=fast-math is supplied.
This was causing functions like round() to fail on SSE2, since it has code
that does:

    x += 0x1.0p23f;
    x -= 0x1.0p23f;

which was in turn being undesirably optimized away.

Fixes issue #211.
2012-03-28 10:26:39 -07:00
Matt Pharr
3270e2bf5a Call CPUID to more reliably detect level of SSE/AVX that the host supports.
Fixes, I hope, issue #205.
2012-03-28 09:20:06 -07:00
Matt Pharr
73bf552cd6 Add support for coalescing memory accesses from gathers.
There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
2012-02-10 13:10:39 -08:00