It uses either compare_and_swap or load_linked/store_conditional. I thought compare_and_swap had a horrible (i.e. very slow) implementation on x86. Any idea whether load_linked/store_conditional is any better?
CAS is slow because it acts as a serializing instruction. If you depend on success or fail the instructions after it have to wait until it has been resolved. Also, the update has to be pushed out to the cache at least.
So yes, slow but otherwise it would be of no use.
load_linked and store_conditional are not much better. Depending on implementation they can spuriously fail because something poked the cache, there was memory traffic etc.
http://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues....
It uses either compare_and_swap or load_linked/store_conditional. I thought compare_and_swap had a horrible (i.e. very slow) implementation on x86. Any idea whether load_linked/store_conditional is any better?