Commit Graph

1580 Commits

Author SHA1 Message Date
Matt Pharr
502f8fd76b Reduce debug spew on failing idiv.ispc tests 2013-08-20 09:22:09 -07:00
Matt Pharr
2b2905b567 Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc.
This should be a bool, not a one-wide vector of bools.  The equivalent
fix was previously made in generic-16.h, but not made here.  (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
2013-08-20 09:05:50 -07:00
Matt Pharr
e7f067d70c Fix handling of __clock() builtin for "generic" targets. 2013-08-20 09:04:52 -07:00
Matt Pharr
d976da7559 Speed up idiv test (dont test int32 as thoroughly) 2013-08-20 08:49:51 -07:00
Dmitry Babokin
84dbd66d10 Merge pull request #563 from jbrodman/debugopt
Separate -O and -g
2013-08-15 13:10:13 -07:00
james.brodman
6be3c24ee5 Separate -O and -g 2013-08-15 15:24:46 -04:00
Matt Pharr
42f31aed69 Another attempt at fixing the Windows build (added sse4-8/sse4-16 targets). 2013-08-14 11:02:45 -07:00
Matt Pharr
ed017c42f1 Fix ispc.vcxproj for Windows builds 2013-08-11 07:47:20 -07:00
Matt Pharr
4766467271 Revert ispc.vcxproj to version from top-of-tree. 2013-08-10 11:23:39 -07:00
Matt Pharr
ea8591a85a Fix build with LLVM top-of-tree (link libcurses) 2013-08-10 11:22:43 -07:00
Matt Pharr
7ab4c5391c Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target. 2013-08-09 19:56:43 -07:00
Matt Pharr
0c5742b6f8 Implement new naming scheme for --target.
Now targets are named like "<isa>-i<mask size>x<gang size>", e.g.
"sse4-i8x16", or "avx2-i32x16".

The old target names are still supported.
2013-08-08 19:23:44 -07:00
Matt Pharr
1d76f74b16 Fix compiler warnings 2013-08-07 12:53:39 -07:00
Matt Pharr
5e5d42b918 Fix build with LLVM 3.1 2013-08-06 17:55:37 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
jbrodman
0755e4f8ff Merge pull request #561 from dbabokin/neon_condition
Fix for Windows build and making NEON target optional
2013-08-06 13:45:30 -07:00
Matt Pharr
ccdbddd388 Add peephole optimization to match int8/int16 averages.
Match the following patterns in IR, turning them into target-specific
intrinsics (e.g. PAVGB on x86) when possible.

(unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2)
(unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2)
(int8)(((int16)a + (int16)b + 1)/2)
(int8)(((int16)a + (int16)b)/2)
(int16)(((int32)a + (int32)b + 1)/2)
(int16)(((int32)a + (int32)b)/2)
2013-08-06 08:59:46 -07:00
Matt Pharr
5b20b06bd9 Add avg_{up,down}_int{8,16} routines to stdlib
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact.  When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)

A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Dmitry Babokin
fb34fc5a85 Merge pull request #559 from ifilippov/debug_phases
Supporting dumping, switching off and debug printing of optimization phases.
2013-08-01 14:55:07 -07:00
Dmitry Babokin
43423c276f Merge pull request #560 from ifilippov/perf
Supporting perf.py on Mac OS
2013-08-01 13:20:01 -07:00
jbrodman
5ffc3a8f4c Merge pull request #558 from dbabokin/win_examples
Fix for examples to make them work on Windows properly
2013-08-01 08:02:42 -07:00
Ilia Filippov
3c06924a02 Supporting perf.py on Mac OS 2013-08-01 12:47:37 +04:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
4f48d3258a Documentation updates for NEON 2013-07-31 20:06:04 -07:00
Matt Pharr
d9c38b5c1f Remove support for using SVML for math lib routines.
This path was poorly maintained and wasn't actually available on most
targets.
2013-07-31 06:56:48 -07:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
Matt Pharr
d7562d3836 Merge branch 'master' into arm 2013-07-31 06:38:17 -07:00
Dmitry Babokin
220f0b0b40 Renaming mandelbrot_tasks files to be different from mandelbrot 2013-07-30 19:53:12 -07:00
Matt Pharr
48ff03112f Remove __pause from stdlib_core() in utils.m4.
It wasn't ever being used, and was breaking compilation on ARM.
2013-07-30 08:44:22 -07:00
Matt Pharr
ab3b633733 Add 8-bit and 16-bit specialized NEON targets.
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
Dmitry Babokin
fa93cb7d0b InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported) 2013-07-29 22:46:36 -07:00
Matt Pharr
b6df447b55 Add reduce_add() for int8 and int16 types.
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
2d063925a1 Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8.
This is slightly cleaner than trunc-ing the i8 mask to i1 and using
a vector select.  (And is probably more safe in terms of good code.)
2013-07-25 09:46:01 -07:00
Matt Pharr
bba84f247c Improved optimization of vector select instructions.
Various LLVM optimization passes are turning code like:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = sext <8 x i1> %cmp to <8 x i32>
. . .
%cmp1 = trunc <8 x i32> %cmp32 to <8 x i1>
%result = select <8 x i1> %cmp1, . . .

Into:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = zext <8 x i1> %cmp to <8 x i32>   # note: zext
. . .
%cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer
%result = select <8 x i1> %cmp1, …

Which in turn isn't matched well by the LLVM code generators, which
in turn leads to fairly inefficient code.  (i.e. it doesn't just emit
a vector compare and blend instruction.)

Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better
describe its functionality.
2013-07-25 09:46:01 -07:00
Matt Pharr
780b0dfe47 Add SSE4-16 target.
Along the lines of sse4-8, this is an 8-wide target for SSE4, using
16-bit elements for the mask.  It's thus (in principle) the best
target for SIMD computation with 16-bit datatypes.
2013-07-25 09:46:01 -07:00
Matt Pharr
04d61afa23 Fix bug in lEmitVaryingSelect() for targets with i1 mask types.
Commit 53414f12e6 introduced a but where lEmitVaryingSelect() would
try to truncate a vector of i1s to a vector of i1s, which in turn
made LLVM's IR analyzer unhappy.
2013-07-25 09:45:20 -07:00
Dmitry Babokin
663ebf7857 Merge pull request #551 from mmp/constfold
Improvements to constant folding.
2013-07-24 10:27:04 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
15a3ef370a Use @llvm.readcyclecounter to implement stdlib clock() function.
Also added a test for the clock builtin.
2013-07-23 17:24:57 -07:00
Matt Pharr
c14659c675 Fix bug in lGetConstantInt() in parse.yy.
Previously, we weren't handling signed/unsigned constant types correctly.
2013-07-23 17:24:57 -07:00
Matt Pharr
f7f281a256 Choose type for integer literals to match the target mask size (if possible).
On a target with a 16-bit mask (for example), we would choose the type
of an integer literal "1024" to be an int16.  Previously, we used an int32,
which is a worse fit and leads to less efficient code than an int16
on a 16-bit mask target.  (However, we'd still give an integer literal
1000000 the type int32, even in a 16-bit target.)

Updated the tests to still pass with 8 and 16-bit targets, given this
change.
2013-07-23 17:24:50 -07:00
Matt Pharr
9ba49eabb2 Reduce estimated costs for 8 and 16-bit min() and max() in stdlib.
These actually compile to a single instruction.
2013-07-23 16:52:43 -07:00
Matt Pharr
e7abf3f2ea Add support for mask vectors of 8 and 16-bit element types.
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements.  This
commit makes it possible to have a target with an 8- or 16-bit mask.
2013-07-23 16:50:11 -07:00
Matt Pharr
83e1630fbc Add support for fast division of varying int values by small constants.
For varying int8/16/32 types, divides by small constants can be
implemented efficiently through multiplies and shifts with integer
types of twice the bit-width; this commit adds this optimization.
    
(Implementation is based on Halide.)
2013-07-23 16:49:56 -07:00
Matt Pharr
0277ba1aaa Improve warnings for right shift by varying amounts.
Fixes:
- Don't issue a warning when the shift is a by the same amount in all
  vector lanes.
- Do issue a warning when it's a compile-time constant but the values
  are different in different lanes.

Previously, we warned iff the shift amount wasn't a compile-time constant.
2013-07-23 16:49:07 -07:00
Matt Pharr
753c001e69 Merge branch 'master' of https://github.com/ispc/ispc into constfold 2013-07-23 16:12:04 -07:00
Dmitry Babokin
10c0b42d0d Merge pull request #549 from mmp/fix-tot
Fix build with LLVM top-of-tree.
2013-07-23 09:14:08 -07:00
Matt Pharr
564e61c828 Improvements to constant folding.
We can now do constant folding with all basic datatypes (the previous
implementation handled int32 well, but had limited, if any, coverage
for other datatypes.)

Reduced a bit of repeated code in the constant folding implementation
through template helper functions.
2013-07-22 16:12:02 -07:00