When we have an "extern" global, now we no longer inadvertently define
storage for it. Further, we now successfully do define storage when we
encounter a definition following one or more extern declarations.
Issues #215 and #217.
This was causing functions like round() to fail on SSE2, since it has code
that does:
x += 0x1.0p23f;
x -= 0x1.0p23f;
which was in turn being undesirably optimized away.
Fixes issue #211.
We were incorrectly trying to type convert the varying offset to a
uniform value, which in turn led to an incorrect compile-time error.
Fixes issue #201.
Rather than explicitly building a DAG and doing a topological sort,
just traverse structs recursively and emit declarations for all of
their dependent structs before emitting the original struct declaration.
Not only is this simpler than the previous implementation, but it
fixes a bug where we'd hit an assert if we had a struct with multiple
contained members of another struct type.
Because they reestablish an 'all on' mask inside their body, it doesn't
make sense to include their cost when evaluating whether it's worth
re-establishing an 'all on' mask dynamically. (This does mean that
EstimateCost()'s return value isn't the most obvious thing, but currently
in all the cases where we need it, this is the more appropriate value to
return.)
Rewrite things to be able to do a float MINPS, for slightly
better code on SSE2 (which has that but not an signed int
min). SSE2 code now 23 instructions (vs 21 intrinsics).
We now have separate Expr implementations for dereferencing pointers
and automatically dereferencing references. This is in particular
necessary so that we can detect attempts to dereference references
with the '*' operator in programs and issue an error in that case.
Fixes issue #192.
safe: indicates that the function can safely be called with an "all off"
execution mask.
costN: (N an integer) overrides the cost estimate for the function with
the given value.
For now this has the __ prefix, as an experimental feature currently only
used in the standard library implementation. It's probably worth making
something along these lines an official feature, but I'm not sure if this
in its current form is quite the right thing.
With AOS data, we can often coalesce the accesses into gathers for the main
part of foreach loops but only fail on the last bits where the mask is not
all on (since the coalescing code doesn't handle mixed masks, yet.) Before,
we'd report success with coalescing and then also report that gathers were needed
for the same accesses that were coalesced, which was a) confusing, and b)
didn't accurately represent what was going on for the majority of the loop
iterations.
In particular, we 1. weren't setting the function mask to 'all on', such that
any mixed function mask would in turn apply inside the foreach loop, and 2.
weren't always setting the internal mask to 'all on' before doing any additional
masking based on the iteration variables.