自旋锁回退策略背后的原因

St.*_*rio 5 c++ assembly cpu-architecture lock-free compare-and-swap

我正在查看来自 OpenJDK12 的 JVM HotSpot 中的自旋锁实现。这是它的实现方式（保留评论）：

// Polite TATAS spinlock with exponential backoff - bounded spin.
// Ideally we'd use processor cycles, time or vtime to control
// the loop, but we currently use iterations.
// All the constants within were derived empirically but work over
// over the spectrum of J2SE reference platforms.
// On Niagara-class systems the back-off is unnecessary but
// is relatively harmless.  (At worst it'll slightly retard
// acquisition times).  The back-off is critical for older SMP systems
// where constant fetching of the LockWord would otherwise impair
// scalability.
//
// Clamp spinning at approximately 1/2 of a context-switch round-trip.
// See synchronizer.cpp for details and rationale.

int Monitor::TrySpin(Thread * const Self) {
  if (TryLock())    return 1;
  if (!os::is_MP()) return 0;

  int Probes  = 0;
  int Delay   = 0;
  int SpinMax = 20;
  for (;;) {
    intptr_t v = _LockWord.FullWord;
    if ((v & _LBIT) == 0) {
      if (Atomic::cmpxchg (v|_LBIT, &_LockWord.FullWord, v) == v) {
        return 1;
      }
      continue;
    }

    SpinPause();

    // Periodically increase Delay -- variable Delay form
    // conceptually: delay *= 1 + 1/Exponent
    ++Probes;
    if (Probes > SpinMax) return 0;

    if ((Probes & 0x7) == 0) {
      Delay = ((Delay << 1)|1) & 0x7FF;
      // CONSIDER: Delay += 1 + (Delay/4); Delay &= 0x7FF ;
    }

    // Stall for "Delay" time units - iterations in the current implementation.
    // Avoid generating coherency traffic while stalled.
    // Possible ways to delay:
    //   PAUSE, SLEEP, MEMBAR #sync, MEMBAR #halt,
    //   wr %g0,%asi, gethrtime, rdstick, rdtick, rdtsc, etc. ...
    // Note that on Niagara-class systems we want to minimize STs in the
    // spin loop.  N1 and brethren write-around the L1$ over the xbar into the L2$.
    // Furthermore, they don't have a W$ like traditional SPARC processors.
    // We currently use a Marsaglia Shift-Xor RNG loop.
    if (Self != NULL) {
      jint rv = Self->rng[0];
      for (int k = Delay; --k >= 0;) {
        rv = MarsagliaXORV(rv);
        if (SafepointMechanism::should_block(Self)) return 0;
      }
      Self->rng[0] = rv;
    } else {
      Stall(Delay);
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

链接到源

在Atomic::cmpxchgx86 上实现的地方

template<>
template<typename T>
inline T Atomic::PlatformCmpxchg<8>::operator()(T exchange_value,
                                                T volatile* dest,
                                                T compare_value,
                                                atomic_memory_order /* order */) const {
  STATIC_ASSERT(8 == sizeof(T));
  __asm__ __volatile__ ("lock cmpxchgq %1,(%3)"
                        : "=a" (exchange_value)
                        : "r" (exchange_value), "a" (compare_value), "r" (dest)
                        : "cc", "memory");
  return exchange_value;
}

Run Code Online (Sandbox Code Playgroud)

链接到源

我不明白的是“旧 SMP”系统退避背后的原因。在commnets中说，

回退对于旧的 SMP 系统至关重要，因为在这些系统中不断获取 LockWord 会损害可扩展性。

我可以想象的原因是在较旧的 SMP 系统上，在获取然后 CASingLockWord总线锁定时总是断言（而不是缓存锁定）。正如英特尔手册第 3 卷 8.1.4 中所说：

对于 Intel486 和 Pentium 处理器，即使被锁定的内存区域缓存在处理器中，LOCK#在LOCK操作期间始终在总线上断言该信号。对于 P6 和更新的处理器系列，如果在LOCK操作期间被锁定的内存区域缓存在LOCK 作为回写内存执行操作的处理器中并且完全包含在缓存行中，则处理器可能不会断言该LOCK#信号在公交车上。

这是真正的原因吗？或者那是什么？

归档时间：	6 年，3 月前
查看次数：	424 次
最近记录：	6 年，3 月前