Commit Graph

18 Commits

Author SHA1 Message Date
Evghenii
785b2f5d24 added examples 2014-01-06 14:00:36 +01:00
Evghenii
2d8da306a1 merged with master 2013-12-25 21:32:34 +01:00
Ilia Filippov
7bf64bc490 changes in examples (windows) 2013-12-19 21:13:09 +04:00
Evghenii
ddfe782151 merged 2013-12-13 11:56:43 +01:00
Ilia Filippov
f3ff1fcbeb supporting targets in perf windows 2013-11-26 19:12:02 +04:00
Ilia Filippov
935800d7f6 making common.props 2013-11-26 18:58:49 +04:00
evghenii
bb46b561fd Merged with upstream/master 2013-11-22 08:13:16 +01:00
Dmitry Babokin
017e7890f7 Examples makefiles to support setting single target via ISPC_IA_TARGETS 2013-11-14 15:34:30 +04:00
Evghenii
ce5f8cd46f replaced with fresh examples 2013-11-08 14:17:26 +01:00
Ilia Filippov
a910bfb539 Windows support 2013-11-05 16:31:01 +04:00
Ilia Filippov
87cecddabb adding sort to performance checking 2013-09-20 18:57:20 +04:00
Dmitry Babokin
b258027061 Merge pull request #582 from tkoziara/master
Uniform memory allocation in sort example is fixed.
2013-09-16 03:29:43 -07:00
Tomasz Koziara
97068765e8 Copyright reversed. 2013-09-14 18:09:04 +01:00
Tomasz Koziara
ed825b3773 Uniform memory allocation fixed. 2013-09-13 13:14:31 +01:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Tomasz Koziara
a23d69ebe8 Copyright changed to simplify legal matters. 2013-06-25 17:28:27 +01:00
Tomasz Koziara
86ee8db778 Parallel prefix sum added + minor amendements. 2013-06-25 12:45:51 +01:00
Tomasz Koziara
f2452f040d First commit of the radix sort example. 2013-06-24 18:37:44 +01:00