Specifically, launch all of the tasks in one statement, rather than still looping over spans in y and launching a collection of tasks across x for each span. This seems to give a few percent better performance.
12 KiB
12 KiB