The main issue is that they end up generating a number of smaller vector ops (e.g. 4-wide and 8-wide on the 16-wide generic target, which the examples/intrinsics implementations don't currently support. This fixes a number of failing tests for now; it may be worth generalizing the stuff in examples/intrinsics at some point, since as a general principle, e.g. if generating LLVM IR output, the coalescing optimizations are still desirable. Issue #175.
161 KiB
161 KiB