For varying int8/16/32 types, divides by small constants can be implemented efficiently through multiplies and shifts with integer types of twice the bit-width; this commit adds this optimization. (Implementation is based on Halide.)