Commit Graph

88 Commits

Author SHA1 Message Date
Evghenii
546f9cb409 MAJOR CHANGE--- STOP WITH THIS BRANCH-- 2014-01-06 13:51:02 +01:00
Evghenii
15299571f9 use trunk and compile nvvm,llvm,cu files 2014-01-06 08:35:07 +01:00
Evghenii
b2368e243c +1 2014-01-06 08:15:50 +01:00
Evghenii
0d944ac87e use trunk llvm, use openmp in tasksys 2014-01-05 19:04:40 +01:00
evghenii
aa90a48872 +1 2014-01-01 13:41:52 +01:00
evghenii
1672ec80d0 +! 2013-12-25 20:12:36 +01:00
evghenii
bb46b561fd Merged with upstream/master 2013-11-22 08:13:16 +01:00
Ilia Filippov
3fd9d5a025 support of LLVM 3.5 2013-11-21 19:09:43 +04:00
Evghenii
6f200d310f fixed to work with LLVM 3.2 2013-11-21 11:03:03 +01:00
Evghenii
1445202e0e identified bug due to llvm-3.4 2013-11-14 21:18:25 +01:00
Evghenii
e9bc2b7b54 added uniform_new/uniform_delete in util_ptx.m4 and __shfl intrinsics 2013-11-11 09:18:15 +01:00
Evghenii
b50d3944ea allow easy switch between llvm 2013-10-29 10:22:07 +01:00
Evghenii
ac095dbf3e working on nvptx 2013-10-26 16:12:33 +02:00
egaburov
7e9b4c0924 added avx2-i64x4 and avx1.1-i64x4 targets 2013-10-15 10:02:10 +02:00
egaburov
8808a8cc9c Merge remote-tracking branch 'upstream/master' into nvptx 2013-10-13 13:03:00 +02:00
Dmitry Babokin
17b54cb0c8 Fix problem with building ISPC by clang 3.4 2013-10-11 16:29:17 +04:00
Dmitry Babokin
8297edd251 Switching default compiler on Unix from g++ to clang++ 2013-10-11 16:29:16 +04:00
egaburov
5d56d29240 merged with master 2013-10-08 19:13:30 +02:00
Evghenii
9861375f0c renamed avx-i64x4 -> avx1-i64x4 2013-09-13 15:07:14 +02:00
egaburov
7364e06387 added mask64 2013-09-12 12:02:42 +02:00
egaburov
320c41ffcf added svml support. experimental. for some reason all sybmols are visible.. 2013-09-11 15:16:50 +02:00
egaburov
9c79d4d182 addded avxh with vectorWidth=4 support, use --target=avxh to enable it 2013-09-11 12:58:02 +02:00
Ilia Filippov
320b1700ff correction of adding -Werror option 2013-08-30 16:01:01 +04:00
Ilia Filippov
f620cdbaa1 Changes in perf.py functionality, unification of examples, correction build warnings 2013-08-26 14:04:59 +04:00
Matt Pharr
ea8591a85a Fix build with LLVM top-of-tree (link libcurses) 2013-08-10 11:22:43 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Matt Pharr
ab3b633733 Add 8-bit and 16-bit specialized NEON targets.
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
egaburov
67b549a937 Added nvptx64 target. Things to do:
1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll
2. add __global__ & __device__ scope
2. make code work for a single cuda thread
3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize
4. ... and more...
2013-07-28 14:31:43 +02:00
Matt Pharr
780b0dfe47 Add SSE4-16 target.
Along the lines of sse4-8, this is an 8-wide target for SSE4, using
16-bit elements for the mask.  It's thus (in principle) the best
target for SIMD computation with 16-bit datatypes.
2013-07-25 09:46:01 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
e7abf3f2ea Add support for mask vectors of 8 and 16-bit element types.
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements.  This
commit makes it possible to have a target with an 8- or 16-bit mask.
2013-07-23 16:50:11 -07:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
95fcdc36ee Tracking ToT changes, which now require to link option library. This is Unix only. Windows will be fixed separately 2013-06-18 22:12:33 +04:00
Dmitry Babokin
4b388edca9 Splitting .ll files to be compiled in two versions - 32 and 64 bit. Unix only 2013-05-24 10:29:00 +04:00
Dmitry Babokin
e084f1c311 Adding missing copyright info in Makefile 2013-04-26 19:11:20 +02:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
0f631ad49b Add info about compiler used for ispc build to Makefle output 2013-03-18 12:30:06 +04:00
Dmitry Babokin
bee3029764 Adding debug and clang targets, changing asan target 2013-02-21 17:26:21 +04:00
Dmitry Babokin
150d6d1f56 Adding Address Sanitizer build 2013-02-15 06:50:26 -08:00
Dmitry Babokin
8d8d9c63fe Fix for #349: build issue when no git found 2013-02-11 11:01:46 -08:00
Dmitry Babokin
52147ce631 Fixing issue #428: need to specify LLVM libs explicitly 2013-02-11 04:15:50 -08:00
james.brodman
3aaf2ef2d4 ToT Fixes / M4 macro fix 2013-01-14 14:55:10 -05:00
Peng Tu
16b0806d40 Fix LLVM TOT build issue. 2012-11-21 19:09:10 -08:00
Ingo Wald
d492af7bc0 64-bit gather/scatter, aligned load/store, i8 support 2012-09-17 03:39:02 +02:00
Matt Pharr
1a4434d314 Fix build with LLVM top-of-tree 2012-08-11 09:28:48 -07:00
Matt Pharr
38bcecd2f3 Print a useful error if llvm-config isn't found when building.
Previously, there was a ton of unintelligible error spew.

Issue #273.
2012-07-06 13:18:11 -07:00
Matt Pharr
6c7df4cb6b Add initial support for "avx1.1" targets for Ivy Bridge.
So far, only the use of the float/half conversion instructions distinguishes
this from the "avx1" target.

Partial work on issue #263.
2012-06-08 15:55:00 -07:00
Matt Pharr
449d956966 Add support for generic-64 target. 2012-05-25 11:57:28 -07:00
Matt Pharr
4f053e5b83 Pass OPT flags when linking 2012-05-08 13:25:09 -07:00