This gets deferred closer to working with the scalar target, but there are still some issues. (Partially in gamma correction / final clamping, it seems.) This fix causes a ~0.5% performance degradation with e.g. the AVX target, though it's not clear that it's worth having a separate code path in order to not lose this small amount of perf. (Partially addresses issue #167)
28 KiB
28 KiB