Commit Graph

79 Commits

Author SHA1 Message Date
Evghenii
84134678dc ISPC can emit LLVM PTX now 2014-01-10 07:53:09 +01:00
evghenii
bb46b561fd Merged with upstream/master 2013-11-22 08:13:16 +01:00
Ilia Filippov
3fd9d5a025 support of LLVM 3.5 2013-11-21 19:09:43 +04:00
egaburov
5d56d29240 merged with master 2013-10-08 19:13:30 +02:00
james.brodman
8db378b265 Revert "Remove support for using SVML for math lib routines."
This reverts commit d9c38b5c1f.
2013-09-04 16:01:58 -04:00
Ilia Filippov
f620cdbaa1 Changes in perf.py functionality, unification of examples, correction build warnings 2013-08-26 14:04:59 +04:00
Dmitry Babokin
3f2217646e Merge pull request #562 from mmp/arm
New target naming scheme, new targets (SSE4-i8x16 and SSE4-i16x8), plus some cleanup and improvements.
2013-08-22 08:33:25 -07:00
james.brodman
6be3c24ee5 Separate -O and -g 2013-08-15 15:24:46 -04:00
Matt Pharr
0c5742b6f8 Implement new naming scheme for --target.
Now targets are named like "<isa>-i<mask size>x<gang size>", e.g.
"sse4-i8x16", or "avx2-i32x16".

The old target names are still supported.
2013-08-08 19:23:44 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
d9c38b5c1f Remove support for using SVML for math lib routines.
This path was poorly maintained and wasn't actually available on most
targets.
2013-07-31 06:56:48 -07:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
egaburov
67b549a937 Added nvptx64 target. Things to do:
1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll
2. add __global__ & __device__ scope
2. make code work for a single cuda thread
3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize
4. ... and more...
2013-07-28 14:31:43 +02:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
1a7ac8b804 Enable memory alignment management via compiler options 2013-05-24 10:29:01 +04:00
Dmitry Babokin
b6b9daa3c5 Enabling llvm 3.4 2013-05-13 19:25:31 +04:00
Dmitry Babokin
7497e86902 Adding Windows support for aligned memory allocation on Windows 2013-04-26 22:07:30 +02:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
11528b0def Fix for #474: colon separated path in -I 2013-04-17 18:38:57 +04:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
Matt Pharr
0bf1320a32 Remove support for building with LLVM 3.0 2013-01-06 12:27:53 -08:00
Peng Tu
16b0806d40 Fix LLVM TOT build issue. 2012-11-21 19:09:10 -08:00
Matt Pharr
be2108260e Add --opt=force-aligned-memory option.
This forces all vector loads/stores to be done assuming that the given
pointer is aligned to the vector size, thus allowing the use of sometimes
more-efficient instructions.  (If it isn't the case that the memory is
aligned, the program will fail!).
2012-09-14 13:49:45 -07:00
Matt Pharr
19d8f2e258 Generate FMA instructions with AVX2 (when possible).
Issue #320.
2012-08-03 10:43:41 -07:00
Nicolas Trangez
3a007f939a Build: Include unistd.h where required
Some modules require an include of unistd.h (e.g. for getcwd and isatty
definitions).

These changes were required to build successfully on a Fedora 17 system,
using GCC 4.7.0 & glibc-headers 2.15.
2012-07-04 14:49:00 +02:00
Ingo Wald
789e04ce90 Add support for host/device stub functions for offload. 2012-06-12 10:23:49 -07:00
Matt Pharr
1397dbdabc Don't generate colorized output escapes when stderr isn't a TTY.
When piping to a pile, more/less, etc, this is generally undesirable.

This behavior can be overridden with the --colorized-output command-line
flag.
2012-06-04 09:20:57 -07:00
Matt Pharr
ee1fe3aa9f Update build to handle existence of LLVM 3.2 dev branch.
We now compile with LLVM 3.0, 3.1, and 3.2svn.
2012-05-03 08:25:25 -07:00
Matt Pharr
098c4910de Remove support for building with LLVM 2.9.
A forthcoming change uses some features of LLVM 3.0's new type
system, and it's not worth back-porting this to also all work
with LLVM 2.9.
2012-04-15 20:08:51 -07:00
Matt Pharr
581472564d Print "friendly" ispc message when abort/seg fault signal is thrown.
Make crashes that happen in LLVM less inscrutable.

Issue #222.
2012-04-05 15:51:44 -07:00
Matt Pharr
b813452d33 Don't issue a slew of warnings if a bogus cpu type is specified.
Issue #221.
2012-04-03 06:13:28 -07:00
Lu Guanqun
da9dba80a0 fix --outfile option eror 2012-03-20 09:44:49 +08:00
Matt Pharr
777343331e Print numeric version number with --verison. 2012-03-19 14:41:25 -07:00
Matt Pharr
3082ea4765 Require Type::Equal() for all type equality comparisons.
Previously, we uniqued AtomicTypes, so that they could be compared
by pointer equality, but with forthcoming SOA variability changes,
this would become too unwieldy (lacking a more general / ubiquitous
type uniquing implementation.)
2012-03-05 09:58:09 -08:00
Matt Pharr
73bf552cd6 Add support for coalescing memory accesses from gathers.
There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
2012-02-10 13:10:39 -08:00
Matt Pharr
bb8e13e3c9 Add support for -I command-line argument to specify #include search directories. 2012-02-07 08:39:01 -08:00
Matt Pharr
6b3e14b0a4 Add command-line option to enable debugging output from parser. 2012-02-06 15:35:43 -08:00
Matt Pharr
a2b5ce0172 Add --help-dev option, only print developer options when it is used. 2012-02-06 15:35:43 -08:00
Matt Pharr
3efbc71a01 Add fuzz testing of input programs.
When the --fuzz-test command-line option is given, the input program
will be randomly perturbed by the lexer in an effort to trigger
assertions or crashes in the compiler (neither of which should ever
happen, even for malformed programs.)
2012-02-06 15:34:47 -08:00
Matt Pharr
724a843bbd Add --quiet option to supress all diagnostic output 2012-02-06 12:39:09 -08:00
Matt Pharr
642150095d Include LLVM version used to build in version info printed out. 2012-01-17 23:42:22 -07:00
Pierre-Antoine Lacaze
d8728104b4 Handle the case whereby BUILD_DATE is already defined. 2012-01-09 10:29:16 +01:00
Matt Pharr
15cc812e37 Add notion of "unbound" variability to the type system.
Now, when a type is declared without an explicit "uniform" or "varying"
qualifier, its variability is unbound; depending on the context of the
declaration, the variability is later finalized.

Currently, in almost all cases, types with unbound variability are
resolved to varying types; the one exception is typecasts like:
"(int)1"; in this case, the fact that (int) has unbound variability
carries through to the TypeCastExpr, which in turn notices that the
expression being type cast has uniform type and in turn will resolve
(int) to (uniform int).

Fixes issue #127.
2012-01-06 11:52:58 -08:00
Matt Pharr
8938e14442 Add support for emitting ~generic vectorized C++ code.
The compiler now supports an --emit-c++ option, which generates generic
vector C++ code.  To actually compile this code, the user must provide
C++ code that implements a variety of types and operations (e.g. adding
two floating-point vector values together, comparing them, etc).

There are two examples of this required code in examples/intrinsics:
generic-16.h is a "generic" 16-wide implementation that does all required
with scalar math; it's useful for demonstrating the requirements of the
implementation.  Then, sse4.h shows a simple implementation of a SSE4
target that maps the emitted function calls to SSE intrinsics.

When using these example implementations with the ispc test suite,
all but one or two tests pass with gcc and clang on Linux and OSX.
There are currently ~10 failures with icc on Linux, and ~50 failures with
MSVC 2010.  (To be fixed in coming days.)

Performance varies: when running the examples through the sse4.h
target, some have the same performance as when compiled with --target=sse4
from ispc directly (options), while noise is 12% slower, rt is 26%
slower, and aobench is 2.2x slower.  The details of this haven't yet been
carefully investigated, but will be in coming days as well.

Issue #92.
2012-01-04 12:59:03 -08:00
Matt Pharr
e82a720223 Fix various warnings / build issues on Windows 2011-12-15 12:06:38 -08:00
Matt Pharr
8d1b77b235 Have assertion macro and FATAL() text ask user to file a bug, provide URL to do so.
Switch to Assert() from assert() to make it clear it's not the C stdlib one we're
using any more.
2011-12-15 11:11:16 -08:00
Matt Pharr
46bfef3fce Add option to turn off codegen improvements when mask 'all on' is statically known. 2011-12-11 16:16:36 -08:00