More naming consistency: _i32 rather than i32, now. Also improved the m4 macros to generate these sequences to not require as many parameters.
(i.e., stop just reusing the ones for AVX1). For now the only difference is that the int/uint min/max functions call the new intrinsic for that. Once gather is available from LLVM, that will go here as well.