Commit Graph

57 Commits

Author SHA1 Message Date
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
95fcdc36ee Tracking ToT changes, which now require to link option library. This is Unix only. Windows will be fixed separately 2013-06-18 22:12:33 +04:00
Dmitry Babokin
4b388edca9 Splitting .ll files to be compiled in two versions - 32 and 64 bit. Unix only 2013-05-24 10:29:00 +04:00
Dmitry Babokin
e084f1c311 Adding missing copyright info in Makefile 2013-04-26 19:11:20 +02:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
0f631ad49b Add info about compiler used for ispc build to Makefle output 2013-03-18 12:30:06 +04:00
Dmitry Babokin
bee3029764 Adding debug and clang targets, changing asan target 2013-02-21 17:26:21 +04:00
Dmitry Babokin
150d6d1f56 Adding Address Sanitizer build 2013-02-15 06:50:26 -08:00
Dmitry Babokin
8d8d9c63fe Fix for #349: build issue when no git found 2013-02-11 11:01:46 -08:00
Dmitry Babokin
52147ce631 Fixing issue #428: need to specify LLVM libs explicitly 2013-02-11 04:15:50 -08:00
james.brodman
3aaf2ef2d4 ToT Fixes / M4 macro fix 2013-01-14 14:55:10 -05:00
Peng Tu
16b0806d40 Fix LLVM TOT build issue. 2012-11-21 19:09:10 -08:00
Ingo Wald
d492af7bc0 64-bit gather/scatter, aligned load/store, i8 support 2012-09-17 03:39:02 +02:00
Matt Pharr
1a4434d314 Fix build with LLVM top-of-tree 2012-08-11 09:28:48 -07:00
Matt Pharr
38bcecd2f3 Print a useful error if llvm-config isn't found when building.
Previously, there was a ton of unintelligible error spew.

Issue #273.
2012-07-06 13:18:11 -07:00
Matt Pharr
6c7df4cb6b Add initial support for "avx1.1" targets for Ivy Bridge.
So far, only the use of the float/half conversion instructions distinguishes
this from the "avx1" target.

Partial work on issue #263.
2012-06-08 15:55:00 -07:00
Matt Pharr
449d956966 Add support for generic-64 target. 2012-05-25 11:57:28 -07:00
Matt Pharr
4f053e5b83 Pass OPT flags when linking 2012-05-08 13:25:09 -07:00
Matt Pharr
c756c855ea Compile with -O2 by default on Linux/OSX. 2012-05-04 13:55:37 -07:00
Matt Pharr
d99bd279e8 Add generic-32 target. 2012-05-03 11:11:06 -07:00
Matt Pharr
ee1fe3aa9f Update build to handle existence of LLVM 3.2 dev branch.
We now compile with LLVM 3.0, 3.1, and 3.2svn.
2012-05-03 08:25:25 -07:00
Nipunn Koorapati
d0c7b5d35c Merge remote-tracking branch 'upstream/master' 2012-04-06 17:58:21 -04:00
Nipunn Koorapati
802add1f97 Added to the Makefile the ability to point to a
custom installation of llvm and clang.
2012-04-06 17:54:55 -04:00
Matt Pharr
1dac05960a Fix build with LLVM 3.1 ToT 2012-04-05 08:17:56 -07:00
Matt Pharr
a69b7a5a01 Fix build with LLVM 3.1 TOT 2012-03-10 13:06:53 -08:00
Gabe Weisz
c67a286aa6 Add support for 1-wide scalar target.
Issue #40.
2012-01-29 06:36:07 -08:00
Matt Pharr
58a0b4a20d Add separate set of builtins for AVX2.
(i.e., stop just reusing the ones for AVX1).

For now the only difference is that the int/uint min/max
functions call the new intrinsic for that.  Once gather is
available from LLVM, that will go here as well.
2012-01-13 14:40:01 -08:00
Matt Pharr
b60f8b4f70 Fix merge conflicts 2012-01-11 17:13:51 -08:00
Jean-Luc Duprat
0519eea951 Makefile does not hardcode link paths on Linux
Link statically for both x86 and x86-64
2012-01-10 10:34:57 -08:00
Matt Pharr
2be1251c70 Fix Makefile on OSX (uname -o not supported) 2012-01-09 07:40:47 -08:00
Pierre-Antoine Lacaze
b683aa11b1 Fix linking under mingw, libdl is Linux only. 2012-01-09 10:52:46 +01:00
Pierre-Antoine Lacaze
2654bb0112 Handle python installations in non-standards locations. 2012-01-09 10:29:54 +01:00
Matt Pharr
8938e14442 Add support for emitting ~generic vectorized C++ code.
The compiler now supports an --emit-c++ option, which generates generic
vector C++ code.  To actually compile this code, the user must provide
C++ code that implements a variety of types and operations (e.g. adding
two floating-point vector values together, comparing them, etc).

There are two examples of this required code in examples/intrinsics:
generic-16.h is a "generic" 16-wide implementation that does all required
with scalar math; it's useful for demonstrating the requirements of the
implementation.  Then, sse4.h shows a simple implementation of a SSE4
target that maps the emitted function calls to SSE intrinsics.

When using these example implementations with the ispc test suite,
all but one or two tests pass with gcc and clang on Linux and OSX.
There are currently ~10 failures with icc on Linux, and ~50 failures with
MSVC 2010.  (To be fixed in coming days.)

Performance varies: when running the examples through the sse4.h
target, some have the same performance as when compiled with --target=sse4
from ispc directly (options), while noise is 12% slower, rt is 26%
slower, and aobench is 2.2x slower.  The details of this haven't yet been
carefully investigated, but will be in coming days as well.

Issue #92.
2012-01-04 12:59:03 -08:00
Matt Pharr
1d9201fe3d Add "generic" 4, 8, and 16-wide targets.
When used, these targets end up with calls to undefined functions for all
of the various special vector stuff ispc needs to compile ispc programs
(masked store, gather, min/max, sqrt, etc.).

These targets are not yet useful for anything, but are a step toward
having an option to C++ code with calls out to intrinsics.

Reorganized the directory structure a bit and put the LLVM bitcode used
to define target-specific stuff (as well as some generic built-ins stuff)
into a builtins/ directory.

Note that for building on Windows, it's now necessary to set a LLVM_VERSION
environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)
2011-12-19 13:46:50 -08:00
Matt Pharr
04df63d955 Update run_tests.py to work on Windows. Removed JIT-based testing path entirely. 2011-12-06 13:46:20 -08:00
Matt Pharr
e2b6ed3db8 Fix built for LLVM2.9 and 3.1svn 2011-12-06 08:08:41 -08:00
Matt Pharr
455d963962 Don't ignore return value from getcwd() 2011-12-05 09:26:33 -08:00
Matt Pharr
286c23426e Add "double-wide" sse2-x2 target.
i.e. run 8 program instances together, along the lines of the double-pumped
sse4-x2 target.
2011-10-11 15:17:31 -07:00
Matt Pharr
f9c67ff806 Explicit representation of ASTs for all the functions in a compile unit.
Added AST and Function classes.
Now, we parse the whole file and build up the AST for all of the
  functions in the Module before we emit IR for the functions (vs. before,
  when we generated IR along the way as we parsed the source file.)
2011-10-06 15:35:27 -07:00
Matt Pharr
06975bc7ab Add support for compiling to multiple targets.
If a flag along the lines of "--target=sse4,avx-x2" is provided on the command-line,
then the program will be compiled for each of the given targets, with a separate
output file generated for each one.  Further, an output file with dispatch functions
that check the current system's CPU and then chooses the best available variant
is also created.

Issue #11.
2011-10-04 16:01:55 -07:00
Matt Pharr
85063f493c Revert attempt to be clever about which LLVM libraries to link in--just
link all of them.  (This was causing build problems for some folks.)
2011-09-01 05:02:44 -07:00
Matt Pharr
b67498766e Big rewrite / improvement of target handling.
If no CPU is specified, use the host CPU type, not just a default of "nehalem".
Provide better features strings to the LLVM target machinery.
 -> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2
    target (Fixes issue #82).
 -> Slight code improvements from using cmovs in generated code now
Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code
  that doesn't call the popcnt instruction now that we properly tell LLVM
  which instructions are and aren't available for SSE2.)
2011-08-26 09:54:45 -07:00
Matt Pharr
7756265503 Add double-pumped AVX target (i.e., run 16-wide). Not yet tested. 2011-08-20 11:28:22 +01:00
Matt Pharr
04c93043d6 Target handling fixes.
Set the Module's target appropriately when it's first created.
Compile separate 32 and 64 bit versions of the builtins-c bitcocde
  and load the appropriate one based on the target we're compiling
  for.
2011-08-15 16:03:50 +01:00
Matt Pharr
0ac4f7b620 Add various prefetch functions to the standard library. 2011-08-03 13:31:45 -07:00
Matt Pharr
a552927a6a Cleanup implementation of target builtins code.
- Renamed stdlib-sse.ll to builtins-sse.ll (etc.) in an attempt to better indicate
the fact that the stuff in those files has a role beyond implementing stuff for
the standard library.
- Moved declarations of the various __pseudo_* functions from being done with LLVM
API calls in builtins.cpp to just straight up declarations in LLVM assembly
language in builtins.m4.  (Much less code to do it this way, and more clear what's
going on.)
2011-08-01 05:58:43 +01:00
Matt Pharr
bba7211654 Add support for int8/int16 types. Addresses issues #9 and #42. 2011-07-21 06:57:40 +01:00
Matt Pharr
5bcc611409 Implement global atomics and a memory barrier in the standard library.
This checkin provides the standard set of atomic operations and a memory barrier in the ispc standard library.  Both signed and unsigned 32- and 64-bit integer types are supported.
2011-07-04 17:20:42 +01:00
Matt Pharr
c14c3ceba6 Provide both signed and unsigned int variants of bitcode-based builtins.
When creating function Symbols for functions that were defined in LLVM bitcode for the standard library, if any of the function parameters are integer types, create two ispc-side Symbols: one where the integer types are all signed and the other where they are all unsigned.  This allows us to provide, for example, both store_to_int16(reference int a[], uniform int offset, int val) as well as store_to_int16(reference unsigned int a[], uniform int offset, unsigned int val). functions.

Added some additional tests to exercise the new variants of these.

Also fixed some cases where the __{load,store}_int{8,16} builtins would read from/write to memory even if the mask was all off (which could cause crashes in some cases.)
2011-07-04 12:10:26 +01:00