Commit Graph

90 Commits

Author SHA1 Message Date
jbrodman
3d35d4485d Change opt help message to be more clear. 2015-03-16 12:11:46 -07:00
jbrodman
a54c0db457 Added missing newline. 2015-03-06 10:36:07 -08:00
jbrodman
9baade2cb5 Change dll export feature to a switch. 2015-02-23 11:43:06 -08:00
Andrey Guskov
2f2af816e7 3.7-related copyright update 2015-01-20 14:56:58 +03:00
Andrey Guskov
ae8b724d92 Added LLVM 3.7 support 2015-01-19 17:30:59 +03:00
Dmitry Babokin
701bd9b029 Removing -debug-ir functionality for 3.6, as it was removed from LLVM. 2014-12-03 18:28:32 +03:00
evghenii
8745888ce9 merged with master 2014-08-11 10:04:54 +02:00
Anton Mitrokhin
60fa76ccc1 reversed macros LLVM_3_6 to LLVM_3_5+ in .cpp and .h files 2014-08-01 15:40:48 +04:00
Anton Mitrokhin
d0c9b7c9b5 wiped out all LLVM 3.1 support 2014-08-01 14:54:08 +04:00
Anton Mitrokhin
725be222ac added LLVM_3_6 var 2014-07-30 11:50:15 +04:00
evghenii
b3c5a9c4d6 added #ifdef ISPC_NVPTX_ENALED ... #endif guards 2014-07-09 12:32:18 +02:00
Evghenii
84134678dc ISPC can emit LLVM PTX now 2014-01-10 07:53:09 +01:00
evghenii
bb46b561fd Merged with upstream/master 2013-11-22 08:13:16 +01:00
Ilia Filippov
3fd9d5a025 support of LLVM 3.5 2013-11-21 19:09:43 +04:00
egaburov
5d56d29240 merged with master 2013-10-08 19:13:30 +02:00
james.brodman
8db378b265 Revert "Remove support for using SVML for math lib routines."
This reverts commit d9c38b5c1f.
2013-09-04 16:01:58 -04:00
Ilia Filippov
f620cdbaa1 Changes in perf.py functionality, unification of examples, correction build warnings 2013-08-26 14:04:59 +04:00
Dmitry Babokin
3f2217646e Merge pull request #562 from mmp/arm
New target naming scheme, new targets (SSE4-i8x16 and SSE4-i16x8), plus some cleanup and improvements.
2013-08-22 08:33:25 -07:00
james.brodman
6be3c24ee5 Separate -O and -g 2013-08-15 15:24:46 -04:00
Matt Pharr
0c5742b6f8 Implement new naming scheme for --target.
Now targets are named like "<isa>-i<mask size>x<gang size>", e.g.
"sse4-i8x16", or "avx2-i32x16".

The old target names are still supported.
2013-08-08 19:23:44 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
d9c38b5c1f Remove support for using SVML for math lib routines.
This path was poorly maintained and wasn't actually available on most
targets.
2013-07-31 06:56:48 -07:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
egaburov
67b549a937 Added nvptx64 target. Things to do:
1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll
2. add __global__ & __device__ scope
2. make code work for a single cuda thread
3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize
4. ... and more...
2013-07-28 14:31:43 +02:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
1a7ac8b804 Enable memory alignment management via compiler options 2013-05-24 10:29:01 +04:00
Dmitry Babokin
b6b9daa3c5 Enabling llvm 3.4 2013-05-13 19:25:31 +04:00
Dmitry Babokin
7497e86902 Adding Windows support for aligned memory allocation on Windows 2013-04-26 22:07:30 +02:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
11528b0def Fix for #474: colon separated path in -I 2013-04-17 18:38:57 +04:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
Matt Pharr
0bf1320a32 Remove support for building with LLVM 3.0 2013-01-06 12:27:53 -08:00
Peng Tu
16b0806d40 Fix LLVM TOT build issue. 2012-11-21 19:09:10 -08:00
Matt Pharr
be2108260e Add --opt=force-aligned-memory option.
This forces all vector loads/stores to be done assuming that the given
pointer is aligned to the vector size, thus allowing the use of sometimes
more-efficient instructions.  (If it isn't the case that the memory is
aligned, the program will fail!).
2012-09-14 13:49:45 -07:00
Matt Pharr
19d8f2e258 Generate FMA instructions with AVX2 (when possible).
Issue #320.
2012-08-03 10:43:41 -07:00
Nicolas Trangez
3a007f939a Build: Include unistd.h where required
Some modules require an include of unistd.h (e.g. for getcwd and isatty
definitions).

These changes were required to build successfully on a Fedora 17 system,
using GCC 4.7.0 & glibc-headers 2.15.
2012-07-04 14:49:00 +02:00
Ingo Wald
789e04ce90 Add support for host/device stub functions for offload. 2012-06-12 10:23:49 -07:00
Matt Pharr
1397dbdabc Don't generate colorized output escapes when stderr isn't a TTY.
When piping to a pile, more/less, etc, this is generally undesirable.

This behavior can be overridden with the --colorized-output command-line
flag.
2012-06-04 09:20:57 -07:00
Matt Pharr
ee1fe3aa9f Update build to handle existence of LLVM 3.2 dev branch.
We now compile with LLVM 3.0, 3.1, and 3.2svn.
2012-05-03 08:25:25 -07:00
Matt Pharr
098c4910de Remove support for building with LLVM 2.9.
A forthcoming change uses some features of LLVM 3.0's new type
system, and it's not worth back-porting this to also all work
with LLVM 2.9.
2012-04-15 20:08:51 -07:00
Matt Pharr
581472564d Print "friendly" ispc message when abort/seg fault signal is thrown.
Make crashes that happen in LLVM less inscrutable.

Issue #222.
2012-04-05 15:51:44 -07:00
Matt Pharr
b813452d33 Don't issue a slew of warnings if a bogus cpu type is specified.
Issue #221.
2012-04-03 06:13:28 -07:00
Lu Guanqun
da9dba80a0 fix --outfile option eror 2012-03-20 09:44:49 +08:00
Matt Pharr
777343331e Print numeric version number with --verison. 2012-03-19 14:41:25 -07:00
Matt Pharr
3082ea4765 Require Type::Equal() for all type equality comparisons.
Previously, we uniqued AtomicTypes, so that they could be compared
by pointer equality, but with forthcoming SOA variability changes,
this would become too unwieldy (lacking a more general / ubiquitous
type uniquing implementation.)
2012-03-05 09:58:09 -08:00
Matt Pharr
73bf552cd6 Add support for coalescing memory accesses from gathers.
There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
2012-02-10 13:10:39 -08:00
Matt Pharr
bb8e13e3c9 Add support for -I command-line argument to specify #include search directories. 2012-02-07 08:39:01 -08:00