These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact. When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)
A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
Along the lines of sse4-8, this is an 8-wide target for SSE4, using
16-bit elements for the mask. It's thus (in principle) the best
target for SIMD computation with 16-bit datatypes.