We now try harder to keep the names of instructions related to the
initial names of variables they're derived from and so forth. This
is useful for making both LLVM IR as well as generated C++ code
easier to correlate back to the original ispc source code.
Issue #244.
Now, when we're printing out a constant vector value, we check to see
if it's a splat and call out to one of the __splat_* functions in
the generated code if to.
These tests all fail with generic-16/c++ output currently; however, the
output indicates that it's just small floating-point differences.
(Though the question remains, why are those differences popping up?)
When we're manually scalarizing the extraction of the first element
of a vector value, we need to be careful about handling constant values
and about where new instructions are inserted. The old code was
sloppy about this, which in turn lead to invalid IR in some cases.
For example, the two bugs below were essentially due to generating
an extractelement inst from a zeroinitializer value and then inserting
it in the wrong bblock such that a phi node that used that value was
malformed.
Fixes issues #240 and #229.
Various optimization passes depend on turning a compile-time constant
mask into a bit vector; it turns out that in LLVM3.1, constant vectors
of ints/floats are represented with llvM::ConstantDataVector, but
constant vectors of bools use llvm::ConstantVector (which is what LLVM
3.0 uses for all constant vectors). Now lGetMask() always does the
llvm::ConstantVector path, to cover this case.
This improves generated C++ code by eliminating things like select
with an all on/off mask, turning movmask calls with constants into
constant values, etc.
When the mask was all off, we'd choose the incorrect operand!
(This bug was masked since this optimization wasn't triggering as
intended, due to other issues to be fixed in a forthcoming commit.
Issue an error, rather than crashing, if the user has declared a
struct type but not defined it and subsequently tries to:
- dynamically allocate an instance of the struct type
- do pointer math with a pointer to the struct type
- compute the size of the struct type
Now a declaration like 'struct Foo;' can be used to establish the
name of a struct type, without providing a definition. One can
pass pointers to such types around the system, but can't do much
else with them (as in C/C++).
Issue #125.
The decl.* code now no longer interacts with Symbols, but just returns
names, types, initializer expressions, etc., as needed. This makes the
code a bit more understandable.
Fixes issues #171 and #130.
We still need to call ResolveUnboundVariability even if the
type returns false from HasUnboundVariability; we may have,
for example, a pointer type where the pointer is resolved,
but the pointed-to type is unresolved.
Fixes issue #228.
Once we're down to something that's not another nested expr list, use
TypeConvertExpr() to convert the expression to the type we need. This should
allow simplifying a number of the GetConstant() implementations, to remove
partial reimplementation of type conversion there.
For now, this change finishes off issue #220.
Previously, the compiler would crash if e.g. the program passed a
temporary value to a function taking a const reference. This change
fixes ReferenceExpr::GetValue() to handle this case and allocate
temporary storage for the temporary so that the pointer to that
storage can be used for the reference value.
This was unnecessary overhead to impose on all callers; the user
should handle these as needed on their own.
Also added some explanatory text to the documentation that highlights
that memory_barrier() is only needed across HW threads/cores, not
across program instances in a gang.
Not only was this quite verbose, it was unnecessary since we do type
equality by name. This also needed to be fixed before we could
handle structs declared like "struct Foo;", when we then e.g. have
other structs with Foo * members.
In InitSymbol(), we try to be smart and emit a memcpy when there
are a number of values to store (e.g. for arrays, structs, etc.)
Unfortunately, this wasn't working as desired for bools (i.e. i1 types),
since the SizeOf() call that tried to figure out how many bytes to
copy would return 0 bytes, due to dividing the number of bits to copy
by 8.
Fixes issue #234.