This allows us to scale up to 64-wide execution.
Typo in __max_varying_double. Add declarations for half functions. Use the gen_scatter macro to get the scatter functions.
Issue #40.