These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact. When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)
A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
On a target with a 16-bit mask (for example), we would choose the type
of an integer literal "1024" to be an int16. Previously, we used an int32,
which is a worse fit and leads to less efficient code than an int16
on a 16-bit mask target. (However, we'd still give an integer literal
1000000 the type int32, even in a 16-bit target.)
Updated the tests to still pass with 8 and 16-bit targets, given this
change.
For varying int8/16/32 types, divides by small constants can be
implemented efficiently through multiplies and shifts with integer
types of twice the bit-width; this commit adds this optimization.
(Implementation is based on Halide.)
All of these pass the current mask to FunctionEmitContext::SetBlockEntryMask()
so that when a break/continue/return is encountered, it can test to see if all
lanes have followed that path and then return; this in turn ensures that we never
run statements with an all-off execution mask.
These functions were passing the function internal mask, not the full mask, and
thus could end up executing code with the mask all off if some lanes were
disabled by an outer function. (The new tests test this case.)
In particular, this gives us desired behavior for NaNs (all compares
involving a NaN evaluate to true). This in turn allows writing the
canonical isnan() function as "v != v".
Added isnan() to the standard library as well.
Previously, we were trying to take a uniform seed and then shuffle that
around to initialize the state for each of the program instances. This
was becoming increasingly untenable and brittle.
Now a varying seed is expected and used.
These tests all fail with generic-16/c++ output currently; however, the
output indicates that it's just small floating-point differences.
(Though the question remains, why are those differences popping up?)
Now a declaration like 'struct Foo;' can be used to establish the
name of a struct type, without providing a definition. One can
pass pointers to such types around the system, but can't do much
else with them (as in C/C++).
Issue #125.
The decl.* code now no longer interacts with Symbols, but just returns
names, types, initializer expressions, etc., as needed. This makes the
code a bit more understandable.
Fixes issues #171 and #130.
Once we're down to something that's not another nested expr list, use
TypeConvertExpr() to convert the expression to the type we need. This should
allow simplifying a number of the GetConstant() implementations, to remove
partial reimplementation of type conversion there.
For now, this change finishes off issue #220.
Previously, the compiler would crash if e.g. the program passed a
temporary value to a function taking a const reference. This change
fixes ReferenceExpr::GetValue() to handle this case and allocate
temporary storage for the temporary so that the pointer to that
storage can be used for the reference value.
Closer compatibility with C++: given a non-reference type, treat matching
to a non-const reference of that type as a better match than a const
reference of that type (rather than both being equal cost).
Issue #224.
Implicit conversion to function types is now a more standard part of
the type conversion infrastructure, rather than special cases of things
like FunctionSymbolExpr immediately returning a pointer type, etc.
Improved AddressOfExpr::TypeCheck() to actually issue errors in cases
where it's illegal to take the address of an expression.
Added AddressOfExpr::GetConstant() implementation that handles taking
the address of functions.
Issue #223.
We were incorrectly trying to type convert the varying offset to a
uniform value, which in turn led to an incorrect compile-time error.
Fixes issue #201.
In particular, we 1. weren't setting the function mask to 'all on', such that
any mixed function mask would in turn apply inside the foreach loop, and 2.
weren't always setting the internal mask to 'all on' before doing any additional
masking based on the iteration variables.