Commit Graph

97 Commits

Author SHA1 Message Date
egaburov
1710b9171f removed LLVM_3_0 legacy part and changed copyright to 2013 2013-10-18 08:53:01 +02:00
egaburov
7e9b4c0924 added avx2-i64x4 and avx1.1-i64x4 targets 2013-10-15 10:02:10 +02:00
Ilia Filippov
92773ada6d fix for ISPC for compfails at sse4-i8 and sse4-i16 2013-10-11 15:23:40 +04:00
Dmitry Babokin
43245bbc11 Adding check for OS AVX support to auto-dispatch code 2013-09-19 15:39:56 +04:00
Evghenii
9861375f0c renamed avx-i64x4 -> avx1-i64x4 2013-09-13 15:07:14 +02:00
Evghenii
059d80cc11 included suggested changes, ./tests/launch-*.ispc still fails. something is mask64 related, not sure what. help... 2013-09-12 17:18:12 +02:00
egaburov
7364e06387 added mask64 2013-09-12 12:02:42 +02:00
egaburov
efc20c2110 added svml support to all sse/avx modes 2013-09-11 17:07:54 +02:00
egaburov
19379db3b6 svml cleanup 2013-09-11 16:48:56 +02:00
egaburov
7a32699573 added svml.m4 2013-09-11 15:18:03 +02:00
egaburov
320c41ffcf added svml support. experimental. for some reason all sybmols are visible.. 2013-09-11 15:16:50 +02:00
egaburov
9c79d4d182 addded avxh with vectorWidth=4 support, use --target=avxh to enable it 2013-09-11 12:58:02 +02:00
james.brodman
8db378b265 Revert "Remove support for using SVML for math lib routines."
This reverts commit d9c38b5c1f.
2013-09-04 16:01:58 -04:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
Matt Pharr
5b20b06bd9 Add avg_{up,down}_int{8,16} routines to stdlib
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact.  When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)

A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Matt Pharr
d9c38b5c1f Remove support for using SVML for math lib routines.
This path was poorly maintained and wasn't actually available on most
targets.
2013-07-31 06:56:48 -07:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
Matt Pharr
48ff03112f Remove __pause from stdlib_core() in utils.m4.
It wasn't ever being used, and was breaking compilation on ARM.
2013-07-30 08:44:22 -07:00
Matt Pharr
ab3b633733 Add 8-bit and 16-bit specialized NEON targets.
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
Matt Pharr
b6df447b55 Add reduce_add() for int8 and int16 types.
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
2d063925a1 Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8.
This is slightly cleaner than trunc-ing the i8 mask to i1 and using
a vector select.  (And is probably more safe in terms of good code.)
2013-07-25 09:46:01 -07:00
Matt Pharr
780b0dfe47 Add SSE4-16 target.
Along the lines of sse4-8, this is an 8-wide target for SSE4, using
16-bit elements for the mask.  It's thus (in principle) the best
target for SIMD computation with 16-bit datatypes.
2013-07-25 09:46:01 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
15a3ef370a Use @llvm.readcyclecounter to implement stdlib clock() function.
Also added a test for the clock builtin.
2013-07-23 17:24:57 -07:00
Matt Pharr
e7abf3f2ea Add support for mask vectors of 8 and 16-bit element types.
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements.  This
commit makes it possible to have a target with an 8- or 16-bit mask.
2013-07-23 16:50:11 -07:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
7bedb4a081 Add memory alignment dependant on the platform (16/32/64/etc) 2013-05-24 10:29:01 +04:00
Dmitry Babokin
630215f56f Defining memory routines completely separately for Windows/Unix 32/64 bit. 2013-05-24 10:29:01 +04:00
Dmitry Babokin
5362dade37 Fixing util.m4 to declare nothing unless some macro is instantiated 2013-05-24 10:29:00 +04:00
Dmitry Babokin
f22e237381 Minor fix for generic DataLayout 2013-05-13 20:24:51 +04:00
Dmitry Babokin
a47460b4c3 Efficient library implementation of broadcast 2013-05-02 00:12:16 +02:00
jbrodman
018e9a12a3 Merge pull request #484 from dbabokin/malloc
Fix for aligned move of unaligned data in 32 bit platforms.
2013-04-30 12:02:04 -07:00
Dmitry Babokin
26bec62daf Removing duplicating free defintion on Linux 2013-04-27 00:29:51 +04:00
Dmitry Babokin
7497e86902 Adding Windows support for aligned memory allocation on Windows 2013-04-26 22:07:30 +02:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
d36ab4cc3c Adding noalias attribute to malloc return 2013-04-25 20:39:01 +04:00
Dmitry Babokin
e756daa261 Remove sprintf warnings on Windows and fix sprintf-related fails on Mac 2013-04-24 22:36:48 +02:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
james.brodman
3aaf2ef2d4 ToT Fixes / M4 macro fix 2013-01-14 14:55:10 -05:00
Matt Pharr
6412876f64 Remove unused __reduce_add_uint{32,64} target functions.
The stdilb code just calls the signed int{32,64} functions,
which gives the right result for the unsigned case anyway.
The various targets didn't consistently define the unsigned
variants in any case.
2012-09-28 05:55:41 -07:00
Jean-Luc Duprat
f0b0618484 Added the following mask tests: __any(), __all(), __none() for all supported targets.
This allows for more efficient code generation of KNC.
2012-09-14 11:06:18 -07:00
Matt Pharr
49dde7c6f2 Fix bug in declaration of double-precision sqrt intrinsic for AVX targets.
This was preventing sqrts of uniform double values from being compiled
properly.

Issue #344.
2012-08-03 11:43:31 -07:00
Matt Pharr
765a0d8896 Use puts() rather than printf() for printing assertion failure strings.
This way, we don't lose '%'s in the assertion strings.

Issue #342.
2012-08-03 11:31:38 -07:00
Matt Pharr
6a410fc30e Emit gather instructions for the AVX2 targets.
Issue #308.
2012-07-13 12:29:05 -07:00
Matt Pharr
984a68c3a9 Rename gen_gather() macro to gen_gather_factored() 2012-07-13 12:24:12 -07:00
Matt Pharr
98b2e0e426 Fixes for intrinsics unsupported in earlier LLVM versions.
Specifically, don't use the half/float conversion routines with
LLVM 3.0, and don't try to use RDRAND with anything before LLVM 3.2.
2012-07-13 12:14:10 -07:00
Matt Pharr
371d4be8ef Fix bugs in detection of Ivy Bridge systems.
We were incorrectly characterizing them as basic AVX1 without further
extensions, due to a bug in the logic to check CPU features.
2012-07-12 14:11:15 -07:00
Matt Pharr
2c640f7e52 Add support for RDRAND in IvyBridge.
The standard library now provides a variety of rdrand() functions
that call out to RDRAND, when available.

Issue #263.
2012-07-12 06:07:07 -07:00
Matt Pharr
216ac4b1a4 Stop factoring out constant offsets for gather/scatter if instr is available.
For KNC (gather/scatter), it's not helpful to factor base+offsets gathers
and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets.
Now, if a HW instruction is available for gather/scatter, we just factor
into base + {1/2/4/8} * offsets (if possible).  Not only is this simpler,
but it's also what we need to pass a value along to the scale by
2/4/8 available directly in those instructions.

Finishes issue #325.
2012-07-11 14:52:29 -07:00
Matt Pharr
c09c87873e Whitespace / indentation fixes. 2012-07-11 14:29:46 -07:00