jbrodman
0755e4f8ff
Merge pull request #561 from dbabokin/neon_condition
...
Fix for Windows build and making NEON target optional
2013-08-06 13:45:30 -07:00
Dmitry Babokin
dff7735af9
Fix for Windows build and making NEON target optional
2013-08-02 19:24:34 -07:00
Dmitry Babokin
fb34fc5a85
Merge pull request #559 from ifilippov/debug_phases
...
Supporting dumping, switching off and debug printing of optimization phases.
2013-08-01 14:55:07 -07:00
Dmitry Babokin
43423c276f
Merge pull request #560 from ifilippov/perf
...
Supporting perf.py on Mac OS
2013-08-01 13:20:01 -07:00
jbrodman
5ffc3a8f4c
Merge pull request #558 from dbabokin/win_examples
...
Fix for examples to make them work on Windows properly
2013-08-01 08:02:42 -07:00
Ilia Filippov
3c06924a02
Supporting perf.py on Mac OS
2013-08-01 12:47:37 +04:00
Ilia Filippov
a174a90f86
Supporting dumping, switching off and debug printing of optimization phases
2013-08-01 11:37:52 +04:00
Dmitry Babokin
220f0b0b40
Renaming mandelbrot_tasks files to be different from mandelbrot
2013-07-30 19:53:12 -07:00
Dmitry Babokin
fa93cb7d0b
InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)
2013-07-29 22:46:36 -07:00
Dmitry Babokin
663ebf7857
Merge pull request #551 from mmp/constfold
...
Improvements to constant folding.
2013-07-24 10:27:04 -07:00
Dmitry Babokin
10c0b42d0d
Merge pull request #549 from mmp/fix-tot
...
Fix build with LLVM top-of-tree.
2013-07-23 09:14:08 -07:00
Matt Pharr
564e61c828
Improvements to constant folding.
...
We can now do constant folding with all basic datatypes (the previous
implementation handled int32 well, but had limited, if any, coverage
for other datatypes.)
Reduced a bit of repeated code in the constant folding implementation
through template helper functions.
2013-07-22 16:12:02 -07:00
Matt Pharr
946c39a5df
Fix build with LLVM top-of-tree.
...
The DIBuilder::getCU() method has been removed; we now just store the
compilation unit returned when we call DIBuilder::createCompileUnit.
2013-07-22 15:42:52 -07:00
Jean-Luc Duprat
2948e84846
Merge pull request #547 from mmp/arm-merge
...
Add ARM NEON support
2013-07-22 09:24:16 -07:00
Matt Pharr
068fd8098c
Explicitly set armv7-eabi target triple on ARM.
...
This lets the compiler generate FMA instructions, which seems
desirable.
2013-07-20 11:19:10 -07:00
Matt Pharr
d7b0c5794e
Add support for ARM NEON targets.
...
Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests
pass, and all examples compile and run correctly. Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.
Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined. Known
issues/opportunities for improvement include:
- fp32 vector divide is done as a series of scalar divides rather than
a vector divide (which I believe exists, but I may be mistaken.)
This is particularly harmful to examples/rt, which only runs ~1.5x
faster with ispc, likely due to long chains of scalar divides.
- The compiler isn't generating a vmin.f32 for e.g. the final scalar
min in reduce_min(); instead it's generating a compare and then a
select instruction (and similarly elsewhere).
- There are some additional FIXMEs in builtins/target-neon.ll that
include both a few pieces of missing functionality (e.g. rounding
doubles) as well as places that deserve attention for possible
code quality improvements.
- Currently only the "cortex-a9" and "cortex-15" CPU targets are
supported; LLVM supports many other ARM CPUs and ispc should provide
access to all of the ones that have NEON support (and aren't too
obscure.)
- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
only when the compiler runs on an ARM host, though).
- The Windows build hasn't been tested (though I've tried to update
ispc.vcxproj appropriately). It may just work, but will more likely
have various small issues.)
- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Matt Pharr
b007bba59f
Replace inline assembly in task system with equivalent gcc intrinsics.
...
gcc/icc build only: the Windows build still uses the Win32 calls for
these.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
abf43ad01d
Merge pull request #546 from dbabokin/release
...
Release 1.4.4
2013-07-19 18:49:07 -07:00
Dmitry Babokin
922895de69
Changing ISPC version to 1.4.5dev
2013-07-19 18:47:43 -07:00
Dmitry Babokin
28f0bce9f2
Release 1.4.4
v1.4.4
2013-07-19 16:22:10 -07:00
Dmitry Babokin
0f82f216a2
Merge pull request #544 from mmp/master
...
Handle SHL with a constant vector in LLVMVectorIsLinear().
2013-07-18 11:46:11 -07:00
Matt Pharr
7454b1399c
Handle SHL with a constant vector in LLVMVectorIsLinear().
...
LLVM3.4 seems to be turning multiplies by a constant power of 2 into
the equivalent SHL, which was in turn thwarting the pattern matching
for turning gathers/scatters into vector loads/stores.
2013-07-17 14:12:43 -07:00
jbrodman
4ebf46bd63
Merge pull request #543 from mmp/master
...
Fix build with LLVM top-of-tree
2013-07-17 10:38:06 -07:00
Matt Pharr
f1cce0ef5f
Fix build with LLVM top-of-tree
2013-07-17 09:25:00 -07:00
Dmitry Babokin
8c9e873c10
Merge pull request #540 from dbabokin/embree_bug
...
Fix for the bug introduced by --intrumentation fix
2013-07-04 10:45:06 -07:00
Dmitry Babokin
c85439e7bb
Fix for the bug introduced by --intrumentation fix
2013-07-04 21:41:57 +04:00
Ilia Filippov
fd7f87b55e
Supporting perf.py on Windows and some small corrections in it
2013-07-02 19:23:18 +04:00
Dmitry Babokin
8be4128c5a
Merge pull request #534 from ifilippov/perf
...
add script for measuring performance
2013-07-01 05:09:03 -07:00
Ilia Filippov
806e37338c
add script for measuring performance
2013-07-01 13:30:49 +04:00
Dmitry Babokin
ec1095624a
Merge pull request #527 from tkoziara/master
...
examples/sort added
2013-06-25 10:11:39 -07:00
Tomasz Koziara
a23d69ebe8
Copyright changed to simplify legal matters.
2013-06-25 17:28:27 +01:00
Dmitry Babokin
0aff61ffc6
Merge pull request #533 from dbabokin/patch
...
Quick fix for LLVM 3.3 patch
2013-06-25 08:50:32 -07:00
Dmitry Babokin
05aa540984
Quick fix for LLVM 3.3 patch
2013-06-25 19:49:41 +04:00
Dmitry Babokin
033e83e490
Merge pull request #532 from dbabokin/release_1_4_3
...
Release 1.4.3
2013-06-25 07:42:08 -07:00
Dmitry Babokin
594485c38c
Release 1.4.3
v1.4.3
2013-06-25 18:38:21 +04:00
Dmitry Babokin
d52e2d5a8d
License update (just dates)
2013-06-25 17:02:42 +04:00
Dmitry Babokin
1e5d852e2f
Merge pull request #531 from ifilippov/qsize_fail
...
replacement of qsize due to it's fails on MacOS
2013-06-25 05:36:45 -07:00
Ilia Filippov
cc32d913a0
replacement of qsize due to it's fails on MacOS
2013-06-25 16:27:25 +04:00
Dmitry Babokin
fc66066d4d
Merge pull request #530 from dbabokin/llvm_fix
...
Adding LLVM patch to fix #519 with LLVM 3.3
2013-06-25 05:22:09 -07:00
Dmitry Babokin
6169338815
Adding LLVM patch to fix #519 with LLVM 3.3
2013-06-25 16:21:14 +04:00
Tomasz Koziara
86ee8db778
Parallel prefix sum added + minor amendements.
2013-06-25 12:45:51 +01:00
Dmitry Babokin
6bc8cb1ff1
Merge pull request #529 from ifilippov/instrument_fix
...
correction of --instrument option support
2013-06-25 03:08:02 -07:00
Dmitry Babokin
0fc49b1c37
Merge pull request #528 from ifilippov/test3
...
Reapplying lost commits
2013-06-25 02:14:24 -07:00
Ilia Filippov
9fb981e9a0
correction of --instrument option support
2013-06-25 12:33:23 +04:00
Ilia Filippov
cba1b3cedd
additional libraries for LLVM_3_4 build
2013-06-25 12:22:53 +04:00
Ilia Filippov
12c4512932
adding two additional libraries for LLVM_3_4 build
2013-06-25 12:22:53 +04:00
Tomasz Koziara
f2452f040d
First commit of the radix sort example.
2013-06-24 18:37:44 +01:00
Dmitry Babokin
0dd1dbb568
Merge pull request #526 from dbabokin/master
...
Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call
2013-06-23 23:10:19 -07:00
Dmitry Babokin
fdcec5a219
Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call
2013-06-24 10:08:06 +04:00
Dmitry Babokin
bebab7ab0d
Merge pull request #525 from dbabokin/debug
...
--debug output: stdout instead of stderr
2013-06-21 03:56:17 -07:00