Commit Graph

550 Commits

Author SHA1 Message Date
Matt Pharr
0575b1f38d Update run_tests and examples makefile for scalar target.
Fixed a number of tests that didn't handle the programCount == 1
case correctly.
2012-01-29 16:22:25 -08:00
Matt Pharr
f6cd01f7cf Windows build support for scalar target. 2012-01-29 13:48:01 -08:00
Matt Pharr
f2fbc168af Scalar target builtins bugfixes.
Typo in __max_varying_double.
Add declarations for half functions.
Use the gen_scatter macro to get the scatter functions.
2012-01-29 13:47:44 -08:00
Matt Pharr
b50f6f1730 Fix RNG seed code in stdlib for scalar target. 2012-01-29 13:46:57 -08:00
Matt Pharr
f8a7120d9c Detect division by 0 during constant folding and issue a sensible error. 2012-01-29 13:46:38 -08:00
Matt Pharr
20dbf59420 Don't lose source position when returning values of constant symbols. 2012-01-29 13:46:17 -08:00
Gabe Weisz
c67a286aa6 Add support for 1-wide scalar target.
Issue #40.
2012-01-29 06:36:07 -08:00
Matt Pharr
c96fef6bc8 Fix silly error in generic-16.h example C++ bindings. 2012-01-27 17:04:57 -08:00
Matt Pharr
bba02f87ea Improve implementations of unsigned <=, >= in sse4 intrinsics file. 2012-01-27 16:49:41 -08:00
Matt Pharr
12dc3f5c28 Fixes to c++ backend for new and delete
Don't include declarations of malloc/free in the generated code (get
the standard ones from system headers instead).

Add a cast to (uint8_t *) before calls to malloc, which C++ requires,
since proper malloc returns a void *.
2012-01-27 16:49:09 -08:00
Matt Pharr
0f01a5dcbe Handle undef values in LLVMVectorValuesAllEqual() 2012-01-27 16:48:14 -08:00
Matt Pharr
664dc3bdda Add support for "new" and "delete" to the language.
Issue #139.
2012-01-27 14:47:06 -08:00
Matt Pharr
bdba3cd97d Bugfix: add per-lane offsets when accessing varying data through a pointer! 2012-01-27 14:44:52 -08:00
Matt Pharr
d9c0f9315a Fix generic targets: half conversion functions weren't declared.
(Broken by 1867b5b31).
2012-01-27 14:44:43 -08:00
Matt Pharr
b7f17d435f Fix crash in gather/scatter optimization pass. 2012-01-27 14:44:35 -08:00
Matt Pharr
37cdc18639 Issue error instead of crashing given attempted function call through non-function.
Fixes issue #163.
2012-01-27 10:01:06 -08:00
Matt Pharr
5893a9c49d Remove incorrect assert 2012-01-27 09:14:45 -08:00
Matt Pharr
24f58fa16a Update per_lane macro to not use ID for lane number in macro expansion
This was leading to unintended consequences if WIDTH was used in macro code,
which was undesirable.
2012-01-27 09:12:13 -08:00
Matt Pharr
56ffc78fa4 Require semicolons after sync, assert, and print statements.
(Silly parser oversight.)
2012-01-27 09:12:13 -08:00
Matt Pharr
061e68bc77 Fix compiler crash from malformed program. 2012-01-27 09:12:13 -08:00
Matt Pharr
177e6312b4 Fix build with LLVM ToT (ConstantVector::getVectorElements() is gone now). 2012-01-27 09:07:58 -08:00
Matt Pharr
1acf4032c2 Merge branch 'master' of https://github.com/jduprat/ispc 2012-01-26 14:18:25 -08:00
Jean-Luc Duprat
9c5444698e run_tests.py fixes:
- Python 3 fixes (can't use print)
 - Fixed for running tests on Windows
2012-01-26 13:39:54 -08:00
Matt Pharr
65f3252760 Various fixes to test running script for Windows.
Also, removed the --valgrind option and replaced it with a more
general --wrap-exe option, which can be used both for running
Valgrind and SDE.
2012-01-26 10:56:29 -08:00
Matt Pharr
e612abe4ba Fix parsing of 64-bit integer constants on Windows.
(i.e., use the 64-bit unsigned integer parsing function,
not the 64-bit signed one.)

Fixes bug #68.
2012-01-26 10:56:28 -08:00
Jean-Luc Duprat
34352e4e0e beefed up stdin.h on Windows so it compiles ispc 1.1.3 2012-01-25 15:04:19 -08:00
Matt Pharr
1867b5b317 Use native float/half conversion instructions with the AVX2 target. 2012-01-24 15:33:38 -08:00
Matt Pharr
a5b7fca7e0 Extract constant offsets from gather/scatter base+offsets offset vectors.
When we're able to turn a general gather/scatter into the "base + offsets"
form, we now try to extract out any constant components of the offsets and
then pass them as a separate parameter to the gather/scatter function
implementation.

We then in turn carefully emit code for the addressing calculation so that
these constant offsets match LLVM's patterns to detect this case, such that
we get the constant offsets directly encoded in the instruction's addressing
calculation in many cases, saving arithmetic instructions to do these
calculations.

Improves performance of stencil by ~15%.  Other workloads unchanged.
2012-01-24 14:41:15 -08:00
Matt Pharr
7be2c399b1 Rename various optimization passes to have more descriptive names.
No functionality change.
2012-01-23 14:49:48 -08:00
Matt Pharr
d6337b3b22 Code cleanups in opt.cpp; no functional change 2012-01-23 14:36:32 -08:00
Matt Pharr
d2f8b0ace5 Add __clock to list of symbols to make internal from builtins. 2012-01-23 06:19:16 -08:00
Matt Pharr
d805e8b183 Add clock() function to standard library.
Also corrected the declaration of num_cores() to return a
uniform value.
2012-01-22 13:05:27 -08:00
Matt Pharr
1f0f2ec05f Include AVX2 in supported ISAs 2012-01-22 07:05:47 -08:00
Matt Pharr
91ac3b9d7c Back out WIP changes to opt.cpp that were inadvertently checked in. 2012-01-21 07:34:53 -08:00
Matt Pharr
d65bf2eb2f Doxygen number bump and release notes for 1.1.3 v1.1.3 2012-01-20 17:04:16 -08:00
Matt Pharr
1bba9d4307 Improve atomic_swap_global() to take advantage of associativity.
We now do a single atomic hardware swap and then effectively do 
swaps between the running program instances such that the result
is the same as if they had happened to run a particular ordering
of hardware swaps themselves.

Also cleaned up __atomic_swap_uniform_* built-in implementations
to not take the mask, which they weren't using anyway.

Finishes Issue #56.
2012-01-20 10:37:33 -08:00
Matt Pharr
4388338dad Fix performance regression introduced in be0c77d556
Effectively, the patterns that detected when given a gather or
scatter in base+offsets form, the offsets were actually a multiple
of 2/4/8, were no longer working.

This change not only fixes this, but also expands the set of
patterns that are matched by this.  For example, given offsets of
the form 4*v1 + 16*v2, it identifies a scale of 4 and new offsets
of v1 + 4*v2.

This fix makes the volume renderer run 1.19x faster, and noise 1.54x
faster.
2012-01-19 17:57:59 -08:00
Matt Pharr
2fb59c90cf Fix C++ backend bug introduced in d14a2de168.
(This was causing a number of tests to fail with the generic
targets.)
2012-01-19 11:35:02 -07:00
Matt Pharr
68f6ea8def For << and >> with C++, detect when all instances are shifting by the same amount.
In this case, we now emit calls to potentially-specialized functions for the
left/right shifts that take a single integer value for the shift amount.  These
in turn can be matched to the corresponding intrinsics for the SSE target.

Issue #145.
2012-01-19 10:04:32 -07:00
Matt Pharr
3f89295d10 Update RNG code in stdlib to use -> operator where appropriate. 2012-01-19 10:02:47 -07:00
Matt Pharr
748b292e77 Improve code for uniform switches with a 'break' under varying control flow.
Previously, when we had a switch statement with a uniform switch condition
but a 'break' statement that was under varying control flow inside the
switch, we'd promote the switch condition to be varying so that the
break would work correctly.

Now, we leave the condition as uniform and are thus able to use the
more-efficient LLVM switch instruction in this case.

Issue #156.
2012-01-19 08:41:19 -07:00
Matt Pharr
6451c3d99d Fix bug with code for initializers for static arrays in generated C++ code.
(This was preventing aobench from compiling successfully with the generic
target.)
2012-01-18 16:55:09 -07:00
Matt Pharr
d14a2de168 Fix generic code emission when building with LLVM3.0/2.9.
Specifically, don't use vector select for masked store blend there,
but emit a call to a undefined __masked_store_blend_*() functions.

Added implementations of these functions to the sse4.h and generic-16.h
in examples/instrinsics.  (Calls to these will never be generated with
LLVM 3.1).
2012-01-17 23:42:22 -07:00
Matt Pharr
642150095d Include LLVM version used to build in version info printed out. 2012-01-17 23:42:22 -07:00
Matt Pharr
3bf3ac7922 Be more conservative about using blending in place of masked store.
More specifically, we do a proper masked store (rather than a load-
blend-store) unless we can determine that we're accessing a stack-allocated
"varying" variable.  This fixes a number of nefarious bugs where given
code like:

    uniform float a[21];
    foreach (i = 0 … 21)
        a[i] = 0;

We'd use a blend and in turn read past the end of a[] in the last
iteration.

Also made slight changes to inlining in aobench; this keeps compiles
to ~5s, versus ~45s without them (with this change).

Fixes issue #160.
2012-01-17 23:42:22 -07:00
Matt Pharr
c6d1cebad4 Update masked_load/store implementations for generic targets to take void *s
(Fixes compile errors when we try to actually use these!)
2012-01-17 23:42:22 -07:00
Matt Pharr
08189ce08c Update "inline" qualifiers in a few examples. 2012-01-17 23:42:22 -07:00
Matt Pharr
7013d7d52f Small documentation updates and cleanups 2012-01-17 23:42:21 -07:00
Matt Pharr
7045b76f84 Improvements to code generation for "foreach"
Specialize the code for the innermost loop to not do any masking
computations for the innermost dimension for the iterations where
we are certainly working on a full vector's worth of data.

This fix improves performance/code quality of "foreach" such that
it's essentially the same as the equivalent "for" loop.

Fixes issue #151.
2012-01-17 11:34:00 -08:00
Matt Pharr
58a0b4a20d Add separate set of builtins for AVX2.
(i.e., stop just reusing the ones for AVX1).

For now the only difference is that the int/uint min/max
functions call the new intrinsic for that.  Once gather is
available from LLVM, that will go here as well.
2012-01-13 14:40:01 -08:00