Proposal:
c#
static public class System.Threading.Interlocked
{
public static void ProcessWideMemoryBarrier();
}
Asymmetric lock free algorithms and data structures, where frequent operations are really cheap (not even interlocked operation) and rare operations are very expensive, are often used to achieve high scalability and performance. It is not possible to implement them in portable .NET code today because there is no process wide memory barrier API that is a key primitive used to implement them.
The proposal is to add system process wide memory barrier API. The implementation is a simple wrapper around FlushProcessWriteBuffers API on Windows; and equivalent on other operating systems (e.g. sys_membarrier on recent Linux kernels).
What is this thing in terms of barriers?
This is system wide barrier. Check what FlushProcessWriteBuffers API does on Windows; or sys_membarrier on Linux. This one is just a thin portable .NET wrapper for these.
The local effect is the same as Interlocked.MemoryBarrier has?
Yes. You would not call this API because of the local effect. You would call it because of the global effect.
Here is a sample implementation of asymmetric lock using this API: https://gist.github.com/jkotas/aff2ca3774414807be7312fa00d3d521
For very advanced scenarios / building low-level primitives, this seems like a reasonable thing to add. We already have support in Windows, and Linux recently added first-class support due to recognizing the need for something like this.
I wonder about a different name, though. It's closely tied to the Windows naming, currently, and another name may be more understandable. Interlocked.ProcessWideMemoryBarrier()?
We should also put in the documentation the difference between this an a normal memory barrier. Namely that a normal memory barrier only insures that the reads and writes from the current CPU can't move across the barrier. However this barrier insures that any read or write from any CPU being used in the process can't move across the barrier.
Normal interlocked operations and barriers allow reasonable shared access if EVERY thread accessing the data uses barriers. Being able to force OTHER CPUs to synchronize with process memory (e.g. flush write buffers, synchronize read buffers), allows you to use non-interlocked operations on some threads and still have reasonable shared access. This is the value of this API.
The cost of this API is that it is a very expensive call. It has to force every CPU in the process do to something, that is likely to be 1000s of cycles. (a normal interlocked operations is probably < 100)
Thus this API is useful when you believe it will be very rare that you actually need to call it.
It also suffers from all the subtlety of lock-free programming (it is very easy to get it wrong).
Nevertheless, when you need it, it is super useful, and if used with a GREAT amount of care, it can be used to good effect. As long as we put this kind of warning in the docs. it is a good addition.
and another name may be more understandable.
Interlocked.ProcessWideMemoryBarrier()?
Agree - updated the name.
Makes sense. Only suggestion would be to make it a suffix, rather than a prefix so both APIs show up side-by-side in IntelliSense and docs, so:
c#
static public class System.Threading.Interlocked
{
public static void MemoryBarrierProcessWide();
}
@jkotas did you plan to do this one? if not do you believe we should keep this in 2.0?
The cost of this API is that it is a very expensive call. It has to force every CPU in the process do to something, that is likely to be 1000s of cycles. (a normal interlocked operations is probably < 100)
isnt it better to do a dummy interlocked operation instead of this? i think i need to investiagate what it does more closely.
It depends on the ratio of reads and writes of your datastructure. E.g. if you are doing million reads for every write, using this API is a win - million dummy interlocked operations is more expensive than a single call to this API.
Most helpful comment
We should also put in the documentation the difference between this an a normal memory barrier. Namely that a normal memory barrier only insures that the reads and writes from the current CPU can't move across the barrier. However this barrier insures that any read or write from any CPU being used in the process can't move across the barrier.
Normal interlocked operations and barriers allow reasonable shared access if EVERY thread accessing the data uses barriers. Being able to force OTHER CPUs to synchronize with process memory (e.g. flush write buffers, synchronize read buffers), allows you to use non-interlocked operations on some threads and still have reasonable shared access. This is the value of this API.
The cost of this API is that it is a very expensive call. It has to force every CPU in the process do to something, that is likely to be 1000s of cycles. (a normal interlocked operations is probably < 100)
Thus this API is useful when you believe it will be very rare that you actually need to call it.
It also suffers from all the subtlety of lock-free programming (it is very easy to get it wrong).
Nevertheless, when you need it, it is super useful, and if used with a GREAT amount of care, it can be used to good effect. As long as we put this kind of warning in the docs. it is a good addition.