This should help with performance of the generated code. Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h) Updated generic-32.h and generic-64.h to the new memory API
Roughly 100 tests fail with this; all the tests need to be audited for assumptions that 16 is the widest width possible…