Remove memory_barrier() calls from atomics.
This was unnecessary overhead to impose on all callers; the user should handle these as needed on their own. Also added some explanatory text to the documentation that highlights that memory_barrier() is only needed across HW threads/cores, not across program instances in a gang.
This commit is contained in:
@@ -3880,6 +3880,11 @@ code.
|
||||
|
||||
void memory_barrier();
|
||||
|
||||
Note that this barrier is *not* needed for coordinating reads and writes
|
||||
among the program instances in a gang; it's only needed for coordinating
|
||||
between multiple hardware threads running on different cores. See the
|
||||
section `Data Races Within a Gang`_ for the guarantees provided about
|
||||
memory read/write ordering across a gang.
|
||||
|
||||
Prefetches
|
||||
----------
|
||||
|
||||
Reference in New Issue
Block a user