james.brodman
be3a40e70b
Fix for 3.4
2013-08-27 15:15:16 -04:00
Dmitry Babokin
f6ce969d9f
Merge pull request #567 from ifilippov/master
...
Changes in perf.py functionality, unification of examples, correction build warnings
2013-08-26 03:26:28 -07:00
Ilia Filippov
f620cdbaa1
Changes in perf.py functionality, unification of examples, correction build warnings
2013-08-26 14:04:59 +04:00
Dmitry Babokin
3f2217646e
Merge pull request #562 from mmp/arm
...
New target naming scheme, new targets (SSE4-i8x16 and SSE4-i16x8), plus some cleanup and improvements.
2013-08-22 08:33:25 -07:00
Matt Pharr
611477e214
Revert change to lEmitVaryingSelect().
...
Using vector select versus a store and masked load for varying vector
selects seems to give worse code. This may be related to
http://llvm.org/bugs/show_bug.cgi?id=16941 .
2013-08-22 07:50:25 -07:00
Dmitry Babokin
9bb5c314cd
Merge pull request #565 from dbabokin/run_tests
...
run_tests.py fix and new switch.
2013-08-22 01:48:22 -07:00
Dmitry Babokin
f31a31478b
Moving time calculation earlier
2013-08-22 12:41:57 +04:00
Dmitry Babokin
5fb30939be
Fix for #564 , using wrong ispc in run_tests.py
2013-08-21 19:46:18 +04:00
Dmitry Babokin
60b413a9cb
Adding --non-interactive switch to run_tests.py
2013-08-21 19:25:30 +04:00
Matt Pharr
502f8fd76b
Reduce debug spew on failing idiv.ispc tests
2013-08-20 09:22:09 -07:00
Matt Pharr
2b2905b567
Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc.
...
This should be a bool, not a one-wide vector of bools. The equivalent
fix was previously made in generic-16.h, but not made here. (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
2013-08-20 09:05:50 -07:00
Matt Pharr
e7f067d70c
Fix handling of __clock() builtin for "generic" targets.
2013-08-20 09:04:52 -07:00
Matt Pharr
d976da7559
Speed up idiv test (dont test int32 as thoroughly)
2013-08-20 08:49:51 -07:00
Dmitry Babokin
84dbd66d10
Merge pull request #563 from jbrodman/debugopt
...
Separate -O and -g
2013-08-15 13:10:13 -07:00
james.brodman
6be3c24ee5
Separate -O and -g
2013-08-15 15:24:46 -04:00
Matt Pharr
42f31aed69
Another attempt at fixing the Windows build (added sse4-8/sse4-16 targets).
2013-08-14 11:02:45 -07:00
Matt Pharr
ed017c42f1
Fix ispc.vcxproj for Windows builds
2013-08-11 07:47:20 -07:00
Matt Pharr
4766467271
Revert ispc.vcxproj to version from top-of-tree.
2013-08-10 11:23:39 -07:00
Matt Pharr
ea8591a85a
Fix build with LLVM top-of-tree (link libcurses)
2013-08-10 11:22:43 -07:00
Matt Pharr
7ab4c5391c
Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.
2013-08-09 19:56:43 -07:00
Matt Pharr
0c5742b6f8
Implement new naming scheme for --target.
...
Now targets are named like "<isa>-i<mask size>x<gang size>", e.g.
"sse4-i8x16", or "avx2-i32x16".
The old target names are still supported.
2013-08-08 19:23:44 -07:00
Matt Pharr
1d76f74b16
Fix compiler warnings
2013-08-07 12:53:39 -07:00
Matt Pharr
5e5d42b918
Fix build with LLVM 3.1
2013-08-06 17:55:37 -07:00
Matt Pharr
cd9afe946c
Merge branch 'master' into arm
...
Conflicts:
Makefile
builtins.cpp
ispc.cpp
ispc.h
ispc.vcxproj
opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844
Revert "Remove support for building with LLVM 3.1"
...
This reverts commit d3c567503b .
Conflicts:
opt.cpp
2013-08-06 17:00:35 -07:00
jbrodman
0755e4f8ff
Merge pull request #561 from dbabokin/neon_condition
...
Fix for Windows build and making NEON target optional
2013-08-06 13:45:30 -07:00
Matt Pharr
ccdbddd388
Add peephole optimization to match int8/int16 averages.
...
Match the following patterns in IR, turning them into target-specific
intrinsics (e.g. PAVGB on x86) when possible.
(unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2)
(unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2)
(int8)(((int16)a + (int16)b + 1)/2)
(int8)(((int16)a + (int16)b)/2)
(int16)(((int32)a + (int32)b + 1)/2)
(int16)(((int32)a + (int32)b)/2)
2013-08-06 08:59:46 -07:00
Matt Pharr
5b20b06bd9
Add avg_{up,down}_int{8,16} routines to stdlib
...
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact. When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)
A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Dmitry Babokin
dff7735af9
Fix for Windows build and making NEON target optional
2013-08-02 19:24:34 -07:00
Dmitry Babokin
fb34fc5a85
Merge pull request #559 from ifilippov/debug_phases
...
Supporting dumping, switching off and debug printing of optimization phases.
2013-08-01 14:55:07 -07:00
Dmitry Babokin
43423c276f
Merge pull request #560 from ifilippov/perf
...
Supporting perf.py on Mac OS
2013-08-01 13:20:01 -07:00
jbrodman
5ffc3a8f4c
Merge pull request #558 from dbabokin/win_examples
...
Fix for examples to make them work on Windows properly
2013-08-01 08:02:42 -07:00
Ilia Filippov
3c06924a02
Supporting perf.py on Mac OS
2013-08-01 12:47:37 +04:00
Ilia Filippov
a174a90f86
Supporting dumping, switching off and debug printing of optimization phases
2013-08-01 11:37:52 +04:00
Matt Pharr
4f48d3258a
Documentation updates for NEON
2013-07-31 20:06:04 -07:00
Matt Pharr
d9c38b5c1f
Remove support for using SVML for math lib routines.
...
This path was poorly maintained and wasn't actually available on most
targets.
2013-07-31 06:56:48 -07:00
Matt Pharr
d3c567503b
Remove support for building with LLVM 3.1
2013-07-31 06:46:45 -07:00
Matt Pharr
d7562d3836
Merge branch 'master' into arm
2013-07-31 06:38:17 -07:00
Dmitry Babokin
220f0b0b40
Renaming mandelbrot_tasks files to be different from mandelbrot
2013-07-30 19:53:12 -07:00
Matt Pharr
48ff03112f
Remove __pause from stdlib_core() in utils.m4.
...
It wasn't ever being used, and was breaking compilation on ARM.
2013-07-30 08:44:22 -07:00
Matt Pharr
ab3b633733
Add 8-bit and 16-bit specialized NEON targets.
...
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
Dmitry Babokin
fa93cb7d0b
InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)
2013-07-29 22:46:36 -07:00
Matt Pharr
b6df447b55
Add reduce_add() for int8 and int16 types.
...
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
2d063925a1
Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8.
...
This is slightly cleaner than trunc-ing the i8 mask to i1 and using
a vector select. (And is probably more safe in terms of good code.)
2013-07-25 09:46:01 -07:00
Matt Pharr
bba84f247c
Improved optimization of vector select instructions.
...
Various LLVM optimization passes are turning code like:
%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = sext <8 x i1> %cmp to <8 x i32>
. . .
%cmp1 = trunc <8 x i32> %cmp32 to <8 x i1>
%result = select <8 x i1> %cmp1, . . .
Into:
%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = zext <8 x i1> %cmp to <8 x i32> # note: zext
. . .
%cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer
%result = select <8 x i1> %cmp1, …
Which in turn isn't matched well by the LLVM code generators, which
in turn leads to fairly inefficient code. (i.e. it doesn't just emit
a vector compare and blend instruction.)
Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better
describe its functionality.
2013-07-25 09:46:01 -07:00
Matt Pharr
780b0dfe47
Add SSE4-16 target.
...
Along the lines of sse4-8, this is an 8-wide target for SSE4, using
16-bit elements for the mask. It's thus (in principle) the best
target for SIMD computation with 16-bit datatypes.
2013-07-25 09:46:01 -07:00
Matt Pharr
04d61afa23
Fix bug in lEmitVaryingSelect() for targets with i1 mask types.
...
Commit 53414f12e6 introduced a but where lEmitVaryingSelect() would
try to truncate a vector of i1s to a vector of i1s, which in turn
made LLVM's IR analyzer unhappy.
2013-07-25 09:45:20 -07:00
Dmitry Babokin
663ebf7857
Merge pull request #551 from mmp/constfold
...
Improvements to constant folding.
2013-07-24 10:27:04 -07:00
Matt Pharr
53414f12e6
Add SSE4 target optimized for computation with 8-bit datatypes.
...
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits. (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
15a3ef370a
Use @llvm.readcyclecounter to implement stdlib clock() function.
...
Also added a test for the clock builtin.
2013-07-23 17:24:57 -07:00