Previously, we uniqued AtomicTypes, so that they could be compared
by pointer equality, but with forthcoming SOA variability changes,
this would become too unwieldy (lacking a more general / ubiquitous
type uniquing implementation.)
We were only resolving unbound variability for the top-level type,
which isn't enough if we have e.g. an unbound-variability pointer
pointing to some type with unbound variability.
Now, the pointed-to type is always uniform by default (if an explicit
rate qualifier isn't provided). This rule is easier to remember and
seems to work well in more cases than the previous rule from 6d7ff7eba2.
Now, if a struct member has an explicit 'uniform' or 'varying'
qualifier, then that member has that variability, regardless of
the variability of the struct's variability. Members without
'uniform' or 'varying' have unbound variability, and in turn
inherit the variability of the struct.
As a result of this, now structs can properly be 'varying' by default,
just like all the other types, while still having sensible semantics.
Now, if rate qualifiers aren't used to specify otherwise, varying
pointers point to uniform types by default. As before, uniform
pointers point to varying types by default.
float *foo; // varying pointer to uniform float
float * uniform foo; // uniform pointer to varying float
These defaults seem to require the least amount of explicit
uniform/varying qualifiers for most common cases, though TBD if it
would be easier to have a single rule that e.g. the pointed-to type
is always uniform by default.
If the initializer is a compile-time constant (or at least a part of it
is), then store the constant value in a module-local constant global
value and then memcpy the value into the variable. This, in turn,
turns into much better assembly in the end.
Issue #176.
We now match C's behavior, where if we have an initializer list with
too-few values for the underlying type, any additional elements are
initialized to zero.
Fixes issue #123.
If given an initializer list with too many elements for the actual array
size, in some cases we would incorrectly resize the explicitly sized array
to be the size implied by the initializer list.
The main issue is that they end up generating a number of smaller
vector ops (e.g. 4-wide and 8-wide on the 16-wide generic target,
which the examples/intrinsics implementations don't currently
support.
This fixes a number of failing tests for now; it may be worth
generalizing the stuff in examples/intrinsics at some point,
since as a general principle, e.g. if generating LLVM IR output,
the coalescing optimizations are still desirable.
Issue #175.
There are two related optimizations that happen now. (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)
First, for any single gather, we are now more flexible in mapping it
to individual memory operations. Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)
Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads. Further, we now generate code that shuffles
these loads around. Doing fewer, larger loads in this manner, when
possible, can be more efficient.
Second, we can coalesce memory accesses across multiple gathers. If
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them. Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
Specifically, if both of the expressions are compile-time constants
and the condition is a varying compile-time constant (even if not
all true or all false), then we can assemble a compile-time constant
result.
Add a number of additional error cases in the grammar.
Enable bison's extended error reporting, to get better messages about the
context of errors and the expected (but not found) tokens at errors.
Improve the printing of these by providing an implementation of yytnamerr
that rewrites things like "TOKEN_MUL_ASSIGN" to "*=" in error messages.
Print the source location (using Error() when yyerror() is called; wiring
this up seems to require no longer building a 'pure parser' but having
yylloc as a global, which in turn led to having to update all of the uses of
it (which previously accessed it as a pointer).
Updated a number of tests_errors for resulting changesin error text.