St.*_*rio 5 c++ assembly cpu-architecture lock-free compare-and-swap
我正在查看来自 OpenJDK12 的 JVM HotSpot 中的自旋锁实现。这是它的实现方式(保留评论):
// Polite TATAS spinlock with exponential backoff - bounded spin.
// Ideally we'd use processor cycles, time or vtime to control
// the loop, but we currently use iterations.
// All the constants within were derived empirically but work over
// over the spectrum of J2SE reference platforms.
// On Niagara-class systems the back-off is unnecessary but
// is relatively harmless. (At worst it'll slightly retard
// acquisition times). The back-off is critical for older SMP systems
// where constant fetching of the LockWord would otherwise impair
// scalability.
//
// Clamp spinning at approximately 1/2 of a context-switch round-trip.
// See synchronizer.cpp for details and rationale.
int Monitor::TrySpin(Thread * const Self) {
if (TryLock()) return 1;
if (!os::is_MP()) return 0;
int Probes = 0;
int Delay = 0;
int SpinMax = 20;
for (;;) {
intptr_t v = _LockWord.FullWord;
if ((v & _LBIT) == 0) {
if (Atomic::cmpxchg (v|_LBIT, &_LockWord.FullWord, v) == v) {
return 1;
}
continue;
}
SpinPause();
// Periodically increase Delay -- variable Delay form
// conceptually: delay *= 1 + 1/Exponent
++Probes;
if (Probes > SpinMax) return 0;
if ((Probes & 0x7) == 0) {
Delay = ((Delay << 1)|1) & 0x7FF;
// CONSIDER: Delay += 1 + (Delay/4); Delay &= 0x7FF ;
}
// Stall for "Delay" time units - iterations in the current implementation.
// Avoid generating coherency traffic while stalled.
// Possible ways to delay:
// PAUSE, SLEEP, MEMBAR #sync, MEMBAR #halt,
// wr %g0,%asi, gethrtime, rdstick, rdtick, rdtsc, etc. ...
// Note that on Niagara-class systems we want to minimize STs in the
// spin loop. N1 and brethren write-around the L1$ over the xbar into the L2$.
// Furthermore, they don't have a W$ like traditional SPARC processors.
// We currently use a Marsaglia Shift-Xor RNG loop.
if (Self != NULL) {
jint rv = Self->rng[0];
for (int k = Delay; --k >= 0;) {
rv = MarsagliaXORV(rv);
if (SafepointMechanism::should_block(Self)) return 0;
}
Self->rng[0] = rv;
} else {
Stall(Delay);
}
}
}
Run Code Online (Sandbox Code Playgroud)
在Atomic::cmpxchgx86 上实现的地方
template<>
template<typename T>
inline T Atomic::PlatformCmpxchg<8>::operator()(T exchange_value,
T volatile* dest,
T compare_value,
atomic_memory_order /* order */) const {
STATIC_ASSERT(8 == sizeof(T));
__asm__ __volatile__ ("lock cmpxchgq %1,(%3)"
: "=a" (exchange_value)
: "r" (exchange_value), "a" (compare_value), "r" (dest)
: "cc", "memory");
return exchange_value;
}
Run Code Online (Sandbox Code Playgroud)
我不明白的是“旧 SMP”系统退避背后的原因。在commnets中说,
回退对于旧的 SMP 系统至关重要,因为在这些系统中不断获取 LockWord 会损害可扩展性。
我可以想象的原因是在较旧的 SMP 系统上,在获取然后 CASingLockWord总线锁定时总是断言(而不是缓存锁定)。正如英特尔手册第 3 卷 8.1.4 中所说:
对于 Intel486 和 Pentium 处理器,即使被锁定的内存区域缓存在处理器中,
LOCK#在LOCK操作期间始终在总线上断言该信号。对于 P6 和更新的处理器系列,如果在LOCK操作期间被锁定的内存区域 缓存在LOCK作为回写内存执行操作的处理器中并且完全包含在缓存行中,则处理器可能不会断言该LOCK#信号在公交车上。
这是真正的原因吗?或者那是什么?