Threaded execution speed of LOCK CMPXCHG

Iam*_*mIC 8 parallel-processing performance x86 assembly locking

I wrote a multi-threaded app to benchmark the speed of running LOCK CMPXCHG (x86 ASM).

On my machine (dual Core - Core 2), with 2 threads running and accessing the same variable, I can perform about 40M ops/second.

Then I gave each thread a unique variable to operate on. Obviously this means there's no locking contention between the threads, so I expected a speed performance. However, the speed didn't change. Why?

Gab*_*abe 14

如果您有2个线程同时访问位于同一缓存行上的数据,则会出现错误共享,其中每个核心必须不断更新其缓存,因为缓存的相同部分已由另一个核心更改.

确保在不同的内存块中分配唯一变量(比如至少相隔128个字节),以确保这不是您遇到的问题.

DDJ有一篇很好的文章描述了虚假共享的可怕影响:http://www.drdobbs.com/go-parallel/article/showArticle.jhtml? articleID = 2170000206

这是Wikipedia的条目:http://en.wikipedia.org/wiki/False_sharing