Commit Graph

128 Commits

Author SHA1 Message Date
Dmitry Babokin
31b95b665b Copyright update 2014-03-12 20:19:16 +04:00
Ilia Filippov
47f7900cd3 support LLVM trunk 2014-03-07 16:28:56 +04:00
Evghenii
668645fcda first commit 2014-02-07 11:05:36 +01:00
Evghenii
d3a6693eef adding __have_native_{rsqrtd,rcpd} to select between native support for double precision reciprocals and using slower but safe version in stdlib 2014-02-04 16:29:23 +01:00
Ilia Filippov
b19937c4dc deleting isPrimitiveType() 2013-12-12 19:25:02 +04:00
Dmitry Babokin
2d2d14744b Fixing --opt=force-aligned-memory for LLVM 3.3+ 2013-12-04 19:00:02 +04:00
Ilia Filippov
3fd9d5a025 support of LLVM 3.5 2013-11-21 19:09:43 +04:00
Dmitry Babokin
42e181112a Add avx1-i32x4 to the list of supported targets 2013-11-14 16:21:30 +04:00
Dmitry Babokin
e100040f28 Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string 2013-11-14 15:37:11 +04:00
Dmitry Babokin
ffc9a33933 avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag 2013-11-14 15:34:30 +04:00
Dmitry Babokin
362ee06b9f Typo fix 2013-10-29 01:35:26 +04:00
Dmitry Babokin
a166eb7ea1 Check AVX OS support in host cpu check code 2013-10-28 22:41:23 +04:00
egaburov
7e9b4c0924 added avx2-i64x4 and avx1.1-i64x4 targets 2013-10-15 10:02:10 +02:00
Matt Pharr
e751977b72 Fix small typo for NEON targets in Target::SupportedTargets 2013-10-12 06:15:57 -07:00
Dmitry Babokin
758efebb3c Add missing testing support for avx1-i64x4 target 2013-09-30 17:54:59 +04:00
Preston Gurd
4b26b8b430 Remove redundant "slm". 2013-09-20 16:44:01 -04:00
Preston Gurd
9e0e9dbecc - Add Silvermont (--cpu=slm) option for llvm 3.4+.
- Change default Sandybridge isa name to avx1-i32x8 from avx-i32x8,
  to conform with replacement of avx-i32x8 by avx1-i32x8 everywhere else.
- Add "target-cpu" attribute, when using AttrBuilder, to correct a problem
  whereby llvm would switch from the command line cpu setting
  to the native (auto-detected) cpu setting on second and subsequent
  functions. e.g. if I wanted to build for Silvermont on a Sandy Bridge
  machine, ispc/llvm would correctly use Silvermont and turn on the
  Silvermont scheduler. For the second and subsequent functions,
  it would auto-detect Sandy Bridge, but still run the Silvermont
  scheduler.
2013-09-20 14:42:46 -04:00
Dmitry Babokin
ce99b17616 Fix for Windows buils to include new target: avx-i64x4 2013-09-14 02:00:23 +04:00
Evghenii
9861375f0c renamed avx-i64x4 -> avx1-i64x4 2013-09-13 15:07:14 +02:00
Evghenii
40af8d6ed5 fixed segfault in tests/launch-*.ispc. nativeVectoWidth in avx-i64x4 was set to 4. Fixed 2013-09-12 20:25:44 +02:00
egaburov
7364e06387 added mask64 2013-09-12 12:02:42 +02:00
egaburov
9c79d4d182 addded avxh with vectorWidth=4 support, use --target=avxh to enable it 2013-09-11 12:58:02 +02:00
james.brodman
28080b0c22 Fix build against 3.4 2013-08-27 16:56:00 -04:00
james.brodman
be3a40e70b Fix for 3.4 2013-08-27 15:15:16 -04:00
Matt Pharr
0c5742b6f8 Implement new naming scheme for --target.
Now targets are named like "<isa>-i<mask size>x<gang size>", e.g.
"sse4-i8x16", or "avx2-i32x16".

The old target names are still supported.
2013-08-08 19:23:44 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
4f48d3258a Documentation updates for NEON 2013-07-31 20:06:04 -07:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
Matt Pharr
ab3b633733 Add 8-bit and 16-bit specialized NEON targets.
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
Matt Pharr
780b0dfe47 Add SSE4-16 target.
Along the lines of sse4-8, this is an 8-wide target for SSE4, using
16-bit elements for the mask.  It's thus (in principle) the best
target for SIMD computation with 16-bit datatypes.
2013-07-25 09:46:01 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
068fd8098c Explicitly set armv7-eabi target triple on ARM.
This lets the compiler generate FMA instructions, which seems
desirable.
2013-07-20 11:19:10 -07:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
2267f278d2 Fix for #503 - avoid omitting frame pointer on Win32 2013-06-04 14:51:36 +04:00
Dmitry Babokin
1a7ac8b804 Enable memory alignment management via compiler options 2013-05-24 10:29:01 +04:00
Dmitry Babokin
b6b9daa3c5 Enabling llvm 3.4 2013-05-13 19:25:31 +04:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
a0462fe1ee #469: Fix for multi-target compilation 2013-04-12 14:06:12 +04:00
Dmitry Babokin
0af2a13349 DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout 2013-03-23 14:38:51 +04:00
Dmitry Babokin
7f0c92eb4d Fix for #431: memory leak due to multiple TargetMachine creation 2013-03-23 14:33:45 +04:00
Dmitry Babokin
0f86255279 Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets). 2013-03-23 14:28:05 +04:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
Dmitry Babokin
524939dc5b Fix for issue #430 2013-02-27 18:03:07 +04:00
Matt Pharr
0bf1320a32 Remove support for building with LLVM 3.0 2013-01-06 12:27:53 -08:00
Matt Pharr
63dd7d9859 Fix build to work with LLVM top-of-tree again 2013-01-06 12:02:08 -08:00
Matt Pharr
8cbfde6092 Small fixes to build with LLVM top-of-tree (now numbered as version 3.3) 2012-12-02 14:29:24 -08:00
Matt Pharr
172a189c6f Fix build with LLVM top-of-tree 2012-10-17 11:11:50 -07:00