Fixes include adding "_float" and "_double" suffixes as appropriate as well
as providing a number of missing implementations.
This fixes a number of failures in the half* tests.
e.g. "__equal()" -> "__equal_float()", etc.
No functional change; this is necessary groundwork for a forthcoming
peephole optimization that eliminates ANDs of masks in some cases.
If we have a vector of all zeros, a __setzero_* function call is emitted,
permitting calling specialized intrinsics for this. Undefined values
are reflected with an __undef_* call, which similarly allows passing that
information along.
This change also includes a cleanup to the signature of the __smear_*
functions; since they already have different names depending on the
scalar value type, we don't need to use the trick of passing an
undefined value of the return vector type as the first parameter as
an indirect way to overload by return value.
Issue #317.
This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)
Updated generic-32.h and generic-64.h to the new memory API