Specifically, don't use vector select for masked store blend there,
but emit a call to a undefined __masked_store_blend_*() functions.
Added implementations of these functions to the sse4.h and generic-16.h
in examples/instrinsics. (Calls to these will never be generated with
LLVM 3.1).
More specifically, we do a proper masked store (rather than a load-
blend-store) unless we can determine that we're accessing a stack-allocated
"varying" variable. This fixes a number of nefarious bugs where given
code like:
uniform float a[21];
foreach (i = 0 … 21)
a[i] = 0;
We'd use a blend and in turn read past the end of a[] in the last
iteration.
Also made slight changes to inlining in aobench; this keeps compiles
to ~5s, versus ~45s without them (with this change).
Fixes issue #160.
Specialize the code for the innermost loop to not do any masking
computations for the innermost dimension for the iterations where
we are certainly working on a full vector's worth of data.
This fix improves performance/code quality of "foreach" such that
it's essentially the same as the equivalent "for" loop.
Fixes issue #151.
(i.e., stop just reusing the ones for AVX1).
For now the only difference is that the int/uint min/max
functions call the new intrinsic for that. Once gather is
available from LLVM, that will go here as well.
If we call a function pointer, CallInst::getCalledFunction() returns NULL; we
need to be careful about this case when we're matching various function calls
in optimization passes.
(Fixes a crash.)
Switches with both uniform and varying "switch" expressions are
supported. Switch statements with varying expressions and very
large numbers of labels may not perform well; some issues to be
filed shortly will track opportunities for improving these.
Previously, we would return immediately if the current basic block
was NULL; however, this is the wrong thing to do in that goto labels
and case/default labels in switch statements will establish a new
current basic block even if the current one is NULL.
Allow to run from the build directory even if it is not on the path
properly decode subprocess stdout/stderr as UTF-8
Added newlines that were mistakenly left out of print->sys.stdout.wriote() conversion in previous CL
Python 3:
- fixed error message comparison
- explicit list creation
Windows:
- forward/back slash annoyances
- added stdint.h with definitions for int32_t, int64_t
- compile_error_files and run_error_files were being appended to improperly
In short, we were inadvertently trying to emit each function's
code a second time if the function had a mask check at the start
of it. StmtList::EmitCode() was covering this error up by
not emitting code if the current basic block is NULL.
We now recognize patterns like (ptr + offset1 + offset2) as being
cases we can handle with the base_offsets variants of the gather/scatter
functions. (This can come up with multidimensional array indexing,
for example.)
Issue #150.
Really, we only have to be careful about the case where there is a vector of bools
(i.e. a mask) involved, since the size of that isn't known at compile-time.
(Currently, at least.)
As part of this, function declarations are no longer scoped (this is permitted
by the C standard, as it turns out.) So code like:
void foo() { void bar(); }
void bat() { bar(); }
Compiles correctly; the declaration of bar() in foo() is still available in the
definition of bar().
Fixes issue #129.
Now, when a type is declared without an explicit "uniform" or "varying"
qualifier, its variability is unbound; depending on the context of the
declaration, the variability is later finalized.
Currently, in almost all cases, types with unbound variability are
resolved to varying types; the one exception is typecasts like:
"(int)1"; in this case, the fact that (int) has unbound variability
carries through to the TypeCastExpr, which in turn notices that the
expression being type cast has uniform type and in turn will resolve
(int) to (uniform int).
Fixes issue #127.