在使用原子和futexes锁定代码时无法找到竞争条件

R..*_*R.. 2 c linux multithreading locking atomic

我正在尝试在我的Linux-futex和基于原子操作的锁定原语中调试导致竞争条件的死锁时非常糟糕.这是我正在使用的代码(与真实代码完全相同的逻辑,只是拉出了与问题无关的数据结构的依赖性):

int readers, writer, waiting;

void wrlock()
{
    int cur;
    while (atomic_swap(&writer, 1)) spin();
    while ((cur=readers)) futex_wait(&readers, cur);
}

void wrunlock()
{
    atomic_store(&writer, 0);
    if (waiting) futex_wake(&writer, ALL);
}

void rdlock()
{
    atomic_inc(&waiting);
    for (;;) {
        atomic_inc(&readers);
        if (!writer) return;
        atomic_dec(&readers);
        futex_wait(&writer, 1);
    }
}

void rdunlock()
{
    atomic_dec(&waiting);
    atomic_dec(&readers);
    if (writer) futex_wake(&readers, ALL);
}
Run Code Online (Sandbox Code Playgroud)

这些atomic_*spin功能非常明显.Linux futex ops是futex_wait(int *mem, int val)futex_wake(int *mem, int how_many_to_wake).

我运行到死锁状态是3个线程,readers==0,writer==1,waiting==2,和所有线程都在等待futex_wait.我不知道这是怎么发生的.

对于每个想要因为不使用pthread原语而对我大喊大叫的人,请将其保存为另一个问题.这是低级代码,它不依赖于glibc/libpthread而运行.在任何情况下,我认为这个问题对于其他人来说可能对学习低级并发黑魔法有用,或者可能会吓跑其他任何人试图编写这样的代码... ;-)

顺便说一下,硬件是x86,所以即使代码存在内存排序问题,我也不认为它们会表现为错误.我猜这里只是对我失踪的futex的一种微妙的误用,特别是因为当所有的等待都被作为旋转时,代码工作正常.

这是生成的asm wrlock(基本上与我发布的版本相同,除了它lock为第一个spinlock 调用一个单独的函数).开头的附加条件返回是"如果我们没有运行多个线程,则纾困".0x3380x33c对应于readerswriter.call 1af实际上是一个futex_wait外部调用的重定位.

00000184 <wrlock>:
 184:   a1 18 00 00 00          mov    0x18,%eax
 189:   55                      push   %ebp
 18a:   85 c0                   test   %eax,%eax
 18c:   89 e5                   mov    %esp,%ebp
 18e:   75 02                   jne    192 <wrlock+0xe>
 190:   c9                      leave
 191:   c3                      ret
 192:   68 3c 03 00 00          push   $0x33c
 197:   e8 7e fe ff ff          call   1a <lock>
 19c:   58                      pop    %eax
 19d:   a1 38 03 00 00          mov    0x338,%eax
 1a2:   85 c0                   test   %eax,%eax
 1a4:   74 ea                   je     190 <wrlock+0xc>
 1a6:   6a 01                   push   $0x1
 1a8:   50                      push   %eax
 1a9:   68 38 03 00 00          push   $0x338
 1ae:   e8 fc ff ff ff          call   1af <wrlock+0x2b>
 1b3:   a1 38 03 00 00          mov    0x338,%eax
 1b8:   83 c4 0c                add    $0xc,%esp
 1bb:   85 c0                   test   %eax,%eax
 1bd:   75 e7                   jne    1a6 <wrlock+0x22>
 1bf:   eb cf                   jmp    190 <wrlock+0xc>
Run Code Online (Sandbox Code Playgroud)

Mic*_*urr 5

我认为这说明了你潜在的僵局.假设一个处理器按以下顺序执行3个线程:

// to start, 
//   readers == 0, writer == 0, waiting == 0

Reader1
===================================

// in rdlock()
    atomic_inc(&waiting);
    for (;;) {
        atomic_inc(&readers);

    // if (!writer) has not been executed yet

    //   readers == 1, writer == 0, waiting == 1

writer 
===================================
// in wrlock()
while (atomic_swap(&writer, 1)) spin();
    while ((cur=readers)) futex_wait(&readers, cur)

    // writer thread is waiting

    // readers == 1, writer == 1, waiting == 1
    //   cur == 1



Reader1
===================================

// back to where it was in rdlock()
        if (!writer) return;
        atomic_dec(&readers);
        futex_wait(&writer, 1);

    // Reader1 is waiting
    //   readers == 0, writer == 1, waiting == 1

Reader2
===================================

// in rdlock()
    atomic_inc(&waiting);
    for (;;) {
        atomic_inc(&readers);
        if (!writer) return;
        atomic_dec(&readers);
        futex_wait(&writer, 1);

    // Reader2 is waiting
    //  readers == 0, writer == 1, waiting == 2
Run Code Online (Sandbox Code Playgroud)

现在你陷入僵局.

当然,我可能不了解futex API是如何工作的(我从来没有使用它们),所以如果我不在这里,请告诉我.我假设一个futex_wait()阻塞(因为期望值是正确的)将不会解除阻塞,直到futex_wake()调用该地址.

如果atomic_xxx()操作可以取消阻止a futex_wait(),则此分析不正确.

最后,如果发生了这种情况,我还没有机会考虑可能的解决方案......