Commit Graph

521 Commits

Author SHA1 Message Date
Matt Pharr
d6337b3b22 Code cleanups in opt.cpp; no functional change 2012-01-23 14:36:32 -08:00
Matt Pharr
d2f8b0ace5 Add __clock to list of symbols to make internal from builtins. 2012-01-23 06:19:16 -08:00
Matt Pharr
d805e8b183 Add clock() function to standard library.
Also corrected the declaration of num_cores() to return a
uniform value.
2012-01-22 13:05:27 -08:00
Matt Pharr
1f0f2ec05f Include AVX2 in supported ISAs 2012-01-22 07:05:47 -08:00
Matt Pharr
91ac3b9d7c Back out WIP changes to opt.cpp that were inadvertently checked in. 2012-01-21 07:34:53 -08:00
Matt Pharr
d65bf2eb2f Doxygen number bump and release notes for 1.1.3 v1.1.3 2012-01-20 17:04:16 -08:00
Matt Pharr
1bba9d4307 Improve atomic_swap_global() to take advantage of associativity.
We now do a single atomic hardware swap and then effectively do 
swaps between the running program instances such that the result
is the same as if they had happened to run a particular ordering
of hardware swaps themselves.

Also cleaned up __atomic_swap_uniform_* built-in implementations
to not take the mask, which they weren't using anyway.

Finishes Issue #56.
2012-01-20 10:37:33 -08:00
Matt Pharr
4388338dad Fix performance regression introduced in be0c77d556
Effectively, the patterns that detected when given a gather or
scatter in base+offsets form, the offsets were actually a multiple
of 2/4/8, were no longer working.

This change not only fixes this, but also expands the set of
patterns that are matched by this.  For example, given offsets of
the form 4*v1 + 16*v2, it identifies a scale of 4 and new offsets
of v1 + 4*v2.

This fix makes the volume renderer run 1.19x faster, and noise 1.54x
faster.
2012-01-19 17:57:59 -08:00
Matt Pharr
2fb59c90cf Fix C++ backend bug introduced in d14a2de168.
(This was causing a number of tests to fail with the generic
targets.)
2012-01-19 11:35:02 -07:00
Matt Pharr
68f6ea8def For << and >> with C++, detect when all instances are shifting by the same amount.
In this case, we now emit calls to potentially-specialized functions for the
left/right shifts that take a single integer value for the shift amount.  These
in turn can be matched to the corresponding intrinsics for the SSE target.

Issue #145.
2012-01-19 10:04:32 -07:00
Matt Pharr
3f89295d10 Update RNG code in stdlib to use -> operator where appropriate. 2012-01-19 10:02:47 -07:00
Matt Pharr
748b292e77 Improve code for uniform switches with a 'break' under varying control flow.
Previously, when we had a switch statement with a uniform switch condition
but a 'break' statement that was under varying control flow inside the
switch, we'd promote the switch condition to be varying so that the
break would work correctly.

Now, we leave the condition as uniform and are thus able to use the
more-efficient LLVM switch instruction in this case.

Issue #156.
2012-01-19 08:41:19 -07:00
Matt Pharr
6451c3d99d Fix bug with code for initializers for static arrays in generated C++ code.
(This was preventing aobench from compiling successfully with the generic
target.)
2012-01-18 16:55:09 -07:00
Matt Pharr
d14a2de168 Fix generic code emission when building with LLVM3.0/2.9.
Specifically, don't use vector select for masked store blend there,
but emit a call to a undefined __masked_store_blend_*() functions.

Added implementations of these functions to the sse4.h and generic-16.h
in examples/instrinsics.  (Calls to these will never be generated with
LLVM 3.1).
2012-01-17 23:42:22 -07:00
Matt Pharr
642150095d Include LLVM version used to build in version info printed out. 2012-01-17 23:42:22 -07:00
Matt Pharr
3bf3ac7922 Be more conservative about using blending in place of masked store.
More specifically, we do a proper masked store (rather than a load-
blend-store) unless we can determine that we're accessing a stack-allocated
"varying" variable.  This fixes a number of nefarious bugs where given
code like:

    uniform float a[21];
    foreach (i = 0 … 21)
        a[i] = 0;

We'd use a blend and in turn read past the end of a[] in the last
iteration.

Also made slight changes to inlining in aobench; this keeps compiles
to ~5s, versus ~45s without them (with this change).

Fixes issue #160.
2012-01-17 23:42:22 -07:00
Matt Pharr
c6d1cebad4 Update masked_load/store implementations for generic targets to take void *s
(Fixes compile errors when we try to actually use these!)
2012-01-17 23:42:22 -07:00
Matt Pharr
08189ce08c Update "inline" qualifiers in a few examples. 2012-01-17 23:42:22 -07:00
Matt Pharr
7013d7d52f Small documentation updates and cleanups 2012-01-17 23:42:21 -07:00
Matt Pharr
7045b76f84 Improvements to code generation for "foreach"
Specialize the code for the innermost loop to not do any masking
computations for the innermost dimension for the iterations where
we are certainly working on a full vector's worth of data.

This fix improves performance/code quality of "foreach" such that
it's essentially the same as the equivalent "for" loop.

Fixes issue #151.
2012-01-17 11:34:00 -08:00
Matt Pharr
58a0b4a20d Add separate set of builtins for AVX2.
(i.e., stop just reusing the ones for AVX1).

For now the only difference is that the int/uint min/max
functions call the new intrinsic for that.  Once gather is
available from LLVM, that will go here as well.
2012-01-13 14:40:01 -08:00
Matt Pharr
0f8eee9809 Fix cases in optimization code to not inadvertently match calls to func ptrs.
If we call a function pointer, CallInst::getCalledFunction() returns NULL; we
need to be careful about this case when we're matching various function calls
in optimization passes.

(Fixes a crash.)
2012-01-12 10:33:06 -08:00
Matt Pharr
0740299860 Fix switch test 2012-01-12 09:45:31 -08:00
Matt Pharr
652215861e Update dynamic target dispatch code to support AVX2. 2012-01-12 08:37:18 -08:00
Matt Pharr
602209e5a8 Tiny updates to documentation, comment for switch stuff. 2012-01-12 05:55:42 -08:00
Matt Pharr
b60f8b4f70 Fix merge conflicts 2012-01-11 17:13:51 -08:00
Matt Pharr
b67446d998 Add support for "switch" statements.
Switches with both uniform and varying "switch" expressions are
supported.  Switch statements with varying expressions and very
large numbers of labels may not perform well; some issues to be
filed shortly will track opportunities for improving these.
2012-01-11 09:16:31 -08:00
Matt Pharr
9670ab0887 Add missing cases to watch out for in lCheckAllOffSafety()
Previously, we weren't checking for member expressions that dereferenced
a pointer or pointer dereference expressions--only array indexing!
2012-01-11 09:16:31 -08:00
Matt Pharr
0223bb85ee Fix bug in StmtList::EmitCode()
Previously, we would return immediately if the current basic block
was NULL; however, this is the wrong thing to do in that goto labels
and case/default labels in switch statements will establish a new
current basic block even if the current one is NULL.
2012-01-11 09:14:39 -08:00
Jean-Luc Duprat
fd81255db1 Removed mutex support for OSX 10.5
Allow to run from the build directory even if it is not on the path
properly decode subprocess stdout/stderr as UTF-8
Added newlines that were mistakenly left out of print->sys.stdout.wriote() conversion in previous CL
Python 3:
 - fixed error message comparison
 - explicit list creation
Windows:
 - forward/back slash annoyances
 - added stdint.h with definitions for int32_t, int64_t
 - compile_error_files and run_error_files were being appended to improperly
2012-01-10 16:55:00 -08:00
Matt Pharr
8a8e1a7f73 Fix bug with multiple EmitCode() calls due to missing braces.
In short, we were inadvertently trying to emit each function's
code a second time if the function had a mask check at the start
of it.  StmtList::EmitCode() was covering this error up by
not emitting code if the current basic block is NULL.
2012-01-10 16:50:13 -08:00
Jean-Luc Duprat
ef05fbf424 run_tests.py more compatible with python 3.x
except for the mutex class...
2012-01-10 13:12:38 -08:00
Jean-Luc Duprat
fa01b63fa5 Remove assumption that . is in the PATH in run_tests.py 2012-01-10 11:41:08 -08:00
Jean-Luc Duprat
63d3d25030 Fixed off by one error in array size generated by bitcode2cpp.py 2012-01-10 11:22:13 -08:00
Jean-Luc Duprat
a8db866228 Python build compatible on both python 2 and 3 2012-01-10 10:42:15 -08:00
Jean-Luc Duprat
0519eea951 Makefile does not hardcode link paths on Linux
Link statically for both x86 and x86-64
2012-01-10 10:34:57 -08:00
Matt Pharr
f4653ecd11 Release notes for 1.1.2 and doxygen version number bump v1.1.2 2012-01-09 16:05:40 -08:00
Jean-Luc Duprat
5d67252ed0 Python scripts now compatible with both 2.x and 3.x releases of python 2012-01-09 13:56:05 -08:00
Matt Pharr
5134de71c0 Fix Windows build (inttypes.h not available) 2012-01-09 09:05:20 -08:00
Matt Pharr
2be1251c70 Fix Makefile on OSX (uname -o not supported) 2012-01-09 07:40:47 -08:00
Matt Pharr
c0161aa17f Merge pull request #154 from palacaze/mingw
Mingw support
2012-01-09 07:37:02 -08:00
Pierre-Antoine Lacaze
b683aa11b1 Fix linking under mingw, libdl is Linux only. 2012-01-09 10:52:46 +01:00
Pierre-Antoine Lacaze
2654bb0112 Handle python installations in non-standards locations. 2012-01-09 10:29:54 +01:00
Pierre-Antoine Lacaze
d8728104b4 Handle the case whereby BUILD_DATE is already defined. 2012-01-09 10:29:16 +01:00
Pierre-Antoine Lacaze
0be1b70fba Mingw has strtoull, make use of it. 2012-01-09 10:28:52 +01:00
Pierre-Antoine Lacaze
a0e9793de3 Shut up warning wrt CONSOLE_SCREEN_BUFFER_INFO initialization 2012-01-09 10:19:46 +01:00
Pierre-Antoine Lacaze
da9200fcee Fix alloca use on mingw. 2012-01-09 10:19:09 +01:00
Pierre-Antoine Lacaze
54e8e8022b suppress warnings about long long arguments 2012-01-09 10:18:39 +01:00
Pierre-Antoine Lacaze
d84cf781da Mingw does not have sysconf, use the msc way of finding processors. 2012-01-09 09:45:40 +01:00
Pierre-Antoine Lacaze
002f27a30f Implement vasprintf and asprintf for platforms lacking them. 2012-01-09 09:44:58 +01:00