Commit Graph

1657 Commits

Author SHA1 Message Date
Matt Pharr
9d4ff1bc06 Fix alignment in usage message 2011-09-12 15:06:41 -07:00
Matt Pharr
83f22f1939 Add experimental --fast-masked-vload flag for SSE. 2011-09-12 12:29:33 -07:00
Matt Pharr
6375ed9224 AVX: Fix bug with misdeclaration of blend intrinsic.
This was preventing the "convert an all-on blend to one of the
  operand values" optimization from kicking on in AVX.
2011-09-12 06:42:38 -07:00
Matt Pharr
cf23cf9ef4 Fix typo in user guide. Issue #96 2011-09-12 05:24:32 -07:00
Matt Pharr
1147b53dcd Add #define with target vector width in emitted headers 2011-09-09 09:33:56 -07:00
Matt Pharr
4cf831a651 When --fast-math is enabled, tell LLVM about it, too. 2011-09-09 09:32:59 -07:00
Matt Pharr
785d8a29d3 Run mem2reg pass even when doing -O0 compiles 2011-09-09 09:24:43 -07:00
Matt Pharr
46d2bad231 Fix malformed program crash 2011-09-09 09:24:43 -07:00
Matt Pharr
32da8e11b4 Fix crash with varying global vector types when emitting header file. 2011-09-09 09:16:59 -07:00
Matt Pharr
5dedb6f836 Add --scale command line argument to mandelbrot and rt examples.
This applies a floating-point scale factor to the image resolution;
it's useful for experiments with many-core systems where the 
base image resolution may not give enough work for good load-balancing
with tasks.
2011-09-07 20:07:51 -07:00
Matt Pharr
2ea6d249d5 Fix mapping to 8, 16 program instances in AO bench example.
With this, we now compute a correct image with AVX.
2011-09-07 11:34:24 -07:00
Matt Pharr
c86128e8ee AVX: go back to using blend (vs. masked store) when possible.
All of the masked store calls were inhibiting putting values into
registers, which in turn led to a lot of unnecessary stack traffic.
This approach seems to give better code in the end.
2011-09-07 11:26:49 -07:00
Matt Pharr
375f1cb8e8 Make octaves and octaves loop uniform in noise example 2011-09-07 10:34:23 -07:00
Matt Pharr
3ca7b6b078 Remove MCJIT stuff from ispc_test (fix Linux build) 2011-09-07 09:44:27 -07:00
Matt Pharr
effe901890 Add task-parallel version of aobench 2011-09-07 05:43:21 -07:00
Matt Pharr
4f451bd041 More AVX fixes
Fix RNG state initialization for 16-wide targets
Fix a number of bugs in reduce_add builtin implementations for AVX.
Fix some tests that had incorrect expected results for the 16-wide
  case.
2011-09-06 15:53:11 -07:00
Matt Pharr
c76ef7b174 Add command-line option to specify position-independent codegen 2011-09-06 11:12:43 -07:00
Matt Pharr
743d82e935 Various documentation updates. 2011-09-06 09:51:02 -07:00
Matt Pharr
18546e9c6d Add option to disable optimizations to test running script 2011-09-04 18:09:00 -07:00
Matt Pharr
f24ab16b91 Release notes, doxygen update for 1.0.7 release. v1.0.7 2011-09-03 07:33:39 -07:00
Matt Pharr
766b34683c Fix Windows build 2011-09-03 07:23:16 -07:00
Matt Pharr
b5bfa43e92 Fix error with float suffixes 2011-09-02 13:09:25 -07:00
Matt Pharr
99221f7d17 Fix a few places in examples where C reference implementaion had a double-precision
fp constant undesirably causing computation to be done in double precision.

Makes C scalar versions of the options pricing models, rt, and aobench 3-5% faster.
Makes scalar version of noise about 15% faster.
Others are unchanged.
2011-09-01 16:31:22 -07:00
Matt Pharr
eb7913f1dd AVX: fix alignment when changing masked load to regular load.
Also added some debugging/tracing stuff (commented out).
Commented out iffy assert that was hitting for avx stuff.
2011-09-01 15:45:49 -07:00
Matt Pharr
08cad7a665 AVX bugfixes 2011-09-01 14:23:10 -07:00
Matt Pharr
9cd92facbd Fix test: was incorrectly failing for 8-wide targets 2011-09-01 05:03:49 -07:00
Matt Pharr
85063f493c Revert attempt to be clever about which LLVM libraries to link in--just
link all of them.  (This was causing build problems for some folks.)
2011-09-01 05:02:44 -07:00
Matt Pharr
f65a20c700 AVX bugfix: when replacing 'all on' masked store with a store,
the rvalue is operand 2, not operand 1 (which is the mask!)
2011-08-31 18:06:29 -07:00
Matt Pharr
e144724979 Improve performance of global atomics, taking advantage of associativity.
For associative atomic ops (add, and, or, xor), we can take advantage of
their associativity to do just a single hardware atomic instruction, 
rather than one for each of the running program instances (as the previous
implementation did.)

The basic approach is to locally compute a reduction across the active
program instances with the given op and to then issue a single HW atomic
with that reduced value as the operand.  We then take the old value that
was stored in the location that is returned from the HW atomic op and
use that to compute the values to return to each of the program instances
(conceptually representing the cumulative effect of each of the preceding
program instances having performed their atomic operation.)

Issue #56.
2011-08-31 05:35:01 -07:00
Matt Pharr
96a297c747 Small improvements to help output 2011-08-30 14:48:22 -07:00
Matt Pharr
67e00b97c6 Fix incorrect assertions in ConstExpr constructors 2011-08-30 11:08:53 -07:00
Matt Pharr
a94cabc692 Modify stencil example to do separate runs with and without task parallelism. 2011-08-30 05:08:21 -07:00
Matt Pharr
ad9e66650d AVX bugfix with alignment for store instructions.
When replacing 'all on' masked store with regular store, set alignment 
to be the vector element alignment, not the alignment for a whole vector. 
(i.e. 4 or 8 byte alignment, not 32 or 64).
2011-08-29 16:58:48 -07:00
Matt Pharr
6de494cfdb Fix AVX bug introduced in 4ab982bc16 2011-08-29 16:50:59 -07:00
Matt Pharr
58e34ba4ae Add new test-driver script, run_tests.py.
Old run_tests.sh still lives (for now).
Changes include:
- Tests are run in parallel across all of the available CPU cores
- Option to create a statically-linked executable for each test
  (rather than using the LLVM JIT).  This is in particular useful
  for AVX, which doesn't have good JIT support yet.
- Static executables also makes it possible to test x86, not
  just x86-64, codegen.
- Fixed a number of tests in failing_tests, which were actually
  failing due to the fact that the expected function signature of
  tests had changed.
2011-08-29 14:15:09 -07:00
Matt Pharr
33feeffe5d Update timing header so it works with C code 2011-08-29 11:23:43 -07:00
Matt Pharr
d0db46aac5 Use logical shift right op for shifts of unsigned ints. Fixes issue #88. 2011-08-29 10:32:26 -07:00
Matt Pharr
da76396c75 Fix typo in SSE2 attributes string. 2011-08-27 08:59:25 -07:00
Matt Pharr
bbf3fb6307 Disable popcnt on SSE4 targets--should only enable if system CPU supports it 2011-08-27 04:09:55 -07:00
Matt Pharr
4ab982bc16 Various AVX fixes (found by inspection).
Emit calls to masked_store, not masked_store_blend, when handling
  masked stores emitted by the frontend.
Fix bug in binary8to16 macro in builtins.m4
Fix bug in 16-wide version of __reduce_add_float
Remove blend function implementations for masked_store_blend for
  AVX; just forward those on to the corresponding real masked store
  functions.
2011-08-26 12:58:02 -07:00
Matt Pharr
34301e09f5 Fix incorrect comment in builtins definitions files.
(And all of the places it was cut and pasted to. :-( ).
2011-08-26 10:44:46 -07:00
Matt Pharr
84e586e767 Commit correct atomics tests 2011-08-26 10:43:30 -07:00
Matt Pharr
72a2f5d2f4 Make SSE2 __popcnt_int64 return i64 to be consistent with other targets 2011-08-26 10:42:12 -07:00
Matt Pharr
606cbab0d4 Performance improvements for global min/max atomics. Issue #57.
Compute a "local" min/max across the active program instances and 
  then do a single atomic memory op.
Added a few tests to exercise global min/max atomics (which were
  previously untested!)
2011-08-26 10:35:24 -07:00
Matt Pharr
54ec56c81d Clean up and centralize LLVM target initialization 2011-08-26 10:15:33 -07:00
Matt Pharr
a322398c62 When emitting header files, put 'extern' declarations of globals used
in ispc code outside of the ispc namespace.  Fixes issue #64.
2011-08-26 10:03:06 -07:00
Matt Pharr
f22b3a25bd Update command-line processing and usage string now that we have a preprocessor on Windows.
We had been prohibiting Windows users from providing #definitions on the command
  line, which is the wrong thing to do ever since we switched to using the
  clang preprocessor.
2011-08-26 09:58:08 -07:00
Matt Pharr
b67498766e Big rewrite / improvement of target handling.
If no CPU is specified, use the host CPU type, not just a default of "nehalem".
Provide better features strings to the LLVM target machinery.
 -> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2
    target (Fixes issue #82).
 -> Slight code improvements from using cmovs in generated code now
Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code
  that doesn't call the popcnt instruction now that we properly tell LLVM
  which instructions are and aren't available for SSE2.)
2011-08-26 09:54:45 -07:00
Matt Pharr
c340ff3893 Fixes to build with LLVM ToT 2011-08-25 08:53:56 +01:00
Matt Pharr
b0f59777d4 Silly bug: don't pass NULL to the print() stmt when we want a llvm::Value * that has the value NULL.
(This was causing crashes with print() statements with no additional values to
  be printed.)
2011-08-25 07:48:13 +01:00