Now, the Linker::LinkModules() call doesn't link in any functions
marked as 'internal', which is problematic, since we'd like to have
just about all of the builtins marked as internal so that they are
eliminated after they've been inlined when they are in fact used.
This change removes all of the internal qualifiers in the builtins
and adds a lSetInternalFunctions() routine to builtins.cpp that
sets this property on the functions that need it after they've
been linked in by LinkModules().
All of the masked store calls were inhibiting putting values into
registers, which in turn led to a lot of unnecessary stack traffic.
This approach seems to give better code in the end.
Fix RNG state initialization for 16-wide targets
Fix a number of bugs in reduce_add builtin implementations for AVX.
Fix some tests that had incorrect expected results for the 16-wide
case.
Emit calls to masked_store, not masked_store_blend, when handling
masked stores emitted by the frontend.
Fix bug in binary8to16 macro in builtins.m4
Fix bug in 16-wide version of __reduce_add_float
Remove blend function implementations for masked_store_blend for
AVX; just forward those on to the corresponding real masked store
functions.