一旦pthread_barrier_wait返回,障碍如何可以销毁?

R..*_*R.. 11 c posix pthreads race-condition barrier

这个问题基于:

什么时候摧毁pthread屏障是否安全?

和最近的glibc错误报告:

http://sourceware.org/bugzilla/show_bug.cgi?id=12674

我不确定glibc中报告的信号量问题,但据推测pthread_barrier_wait,根据上述相关问题,它应该在返回后立即销毁屏障.(通常,获得的线程PTHREAD_BARRIER_SERIAL_THREAD或已经认为自己对屏障对象"负责"的"特殊"线程将是销毁它的那个.)我能想到的主要用例是当使用屏障时同步新线程在创建线程堆栈上使用数据,防止创建线程返回,直到新线程使用数据为止; 其他障碍可能具有与整个程序相同的生命周期,或由其他一些同步对象控制.

在任何情况下,只要在任何线程中返回,实现如何确保屏障的破坏(甚至可能是它所驻留的内存的取消映射)是安全的pthread_barrier_wait?似乎尚未返回的其他线程需要至少检查屏障对象的某些部分才能完成其工作并返回,就像上面提到的glibc错误报告sem_post中调整服务员计数后的情况一样.信号量值.

Mic*_*urr 7

我将pthread_barrier_wait()使用pthreads实现提供的互斥和条件变量功能的示例实现来对此进行另一次破解.请注意,此示例不会尝试处理性能注意事项(特别是,当等待线程被解除阻塞时,它们在退出等待时都会重新序列化).我认为使用Linux Futex对象之类的东西可以帮助解决性能问题,但是Futexes仍然完全不符合我的经验.

此外,我怀疑这个例子正确处理信号或错误(如果是信号的话).但我认为对这些事情的适当支持可以作为读者的练习添加.

我主要担心的是,示例可能存在竞争条件或死锁(互斥锁处理比我喜欢的更复杂).另请注意,这是一个甚至尚未编译的示例.将其视为伪代码.还要记住,我的经验主要是在Windows中 - 我将此作为一种教育机会而不是其他任何东西.因此伪代码的质量可能非常低.

但是,除了免责声明之外,我认为它可以说明如何处理问题中提出的问题(即,该pthread_barrier_wait()函数如何允许pthread_barrier_t其使用的对象被任何释放的线程销毁而不会有使用屏障对象由一个或多个线程出路).

开始:

/* 
 *  Since this is a part of the implementation of the pthread API, it uses
 *  reserved names that start with "__" for internal structures and functions
 *
 *  Functions such as __mutex_lock() and __cond_wait() perform the same function
 *  as the corresponding pthread API.
 */

// struct __barrier_wait data is intended to hold all the data
//  that `pthread_barrier_wait()` will need after releasing
//  waiting threads.  This will allow the function to avoid
//  touching the passed in pthread_barrier_t object after 
//  the wait is satisfied (since any of the released threads
//   can destroy it)

struct __barrier_waitdata {
    struct __mutex cond_mutex;
    struct __cond cond;

    unsigned waiter_count;
    int wait_complete;
};

struct __barrier {
    unsigned count;

    struct __mutex waitdata_mutex;
    struct __barrier_waitdata* pwaitdata;
};

typedef struct __barrier pthread_barrier_t;



int __barrier_waitdata_init( struct __barrier_waitdata* pwaitdata)
{
    waitdata.waiter_count = 0;
    waitdata.wait_complete = 0;

    rc = __mutex_init( &waitdata.cond_mutex, NULL);
    if (!rc) {
        return rc;
    }

    rc = __cond_init( &waitdata.cond, NULL);
    if (!rc) {
        __mutex_destroy( &pwaitdata->waitdata_mutex);
        return rc;
    }

    return 0;
}




int pthread_barrier_init(pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned int count)
{
    int rc;

    result = __mutex_init( &barrier->waitdata_mutex, NULL);
    if (!rc) return result;

    barrier->pwaitdata = NULL;
    barrier->count = count;

    //TODO: deal with attr
}



int pthread_barrier_wait(pthread_barrier_t *barrier)
{
    int rc;
    struct __barrier_waitdata* pwaitdata;
    unsigned target_count;

    // potential waitdata block (only one thread's will actually be used)
    struct __barrier_waitdata waitdata; 

    // nothing to do if we only need to wait for one thread...
    if (barrier->count == 1) return PTHREAD_BARRIER_SERIAL_THREAD;

    rc = __mutex_lock( &barrier->waitdata_mutex);
    if (!rc) return rc;

    if (!barrier->pwaitdata) {
        // no other thread has claimed the waitdata block yet - 
        //  we'll use this thread's

        rc = __barrier_waitdata_init( &waitdata);
        if (!rc) {
            __mutex_unlock( &barrier->waitdata_mutex);
            return rc;
        }

        barrier->pwaitdata = &waitdata;
    }

    pwaitdata = barrier->pwaitdata;
    target_count = barrier->count;

    //  all data necessary for handling the return from a wait is pointed to
    //  by `pwaitdata`, and `pwaitdata` points to a block of data on the stack of
    //  one of the waiting threads.  We have to make sure that the thread that owns
    //  that block waits until all others have finished with the information
    //  pointed to by `pwaitdata` before it returns.  However, after the 'big' wait
    //  is completed, the `pthread_barrier_t` object that's passed into this 
    //  function isn't used. The last operation done to `*barrier` is to set 
    //  `barrier->pwaitdata = NULL` to satisfy the requirement that this function
    //  leaves `*barrier` in a state as if `pthread_barrier_init()` had been called - and
    //  that operation is done by the thread that signals the wait condition 
    //  completion before the completion is signaled.

    // note: we're still holding  `barrier->waitdata_mutex`;

    rc = __mutex_lock( &pwaitdata->cond_mutex);
    pwaitdata->waiter_count += 1;

    if (pwaitdata->waiter_count < target_count) {
        // need to wait for other threads

        __mutex_unlock( &barrier->waitdata_mutex);
        do {
            // TODO:  handle the return code from `__cond_wait()` to break out of this
            //          if a signal makes that necessary
            __cond_wait( &pwaitdata->cond,  &pwaitdata->cond_mutex);
        } while (!pwaitdata->wait_complete);
    }
    else {
        // this thread satisfies the wait - unblock all the other waiters
        pwaitdata->wait_complete = 1;

        // 'release' our use of the passed in pthread_barrier_t object
        barrier->pwaitdata = NULL;

        // unlock the barrier's waitdata_mutex - the barrier is  
        //  ready for use by another set of threads
        __mutex_unlock( barrier->waitdata_mutex);

        // finally, unblock the waiting threads
        __cond_broadcast( &pwaitdata->cond);
    }

    // at this point, barrier->waitdata_mutex is unlocked, the 
    //  barrier->pwaitdata pointer has been cleared, and no further 
    //  use of `*barrier` is permitted...

    // however, each thread still has a valid `pwaitdata` pointer - the 
    // thread that owns that block needs to wait until all others have 
    // dropped the pwaitdata->waiter_count

    // also, at this point the `pwaitdata->cond_mutex` is locked, so
    //  we're in a critical section

    rc = 0;
    pwaitdata->waiter_count--;

    if (pwaitdata == &waitdata) {
        // this thread owns the waitdata block - it needs to hang around until 
        //  all other threads are done

        // as a convenience, this thread will be the one that returns 
        //  PTHREAD_BARRIER_SERIAL_THREAD
        rc = PTHREAD_BARRIER_SERIAL_THREAD;

        while (pwaitdata->waiter_count!= 0) {
            __cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex);
        };

        __mutex_unlock( &pwaitdata->cond_mutex);
        __cond_destroy( &pwaitdata->cond);
        __mutex_destroy( &pwaitdata_cond_mutex);
    }
    else if (pwaitdata->waiter_count == 0) {
        __cond_signal( &pwaitdata->cond);
        __mutex_unlock( &pwaitdata->cond_mutex);
    }

    return rc;
}
Run Code Online (Sandbox Code Playgroud)

20111年7月17日:针对流程共享障碍的评论/问题进行更新

我完全忘记了流程之间共享障碍的情况.正如你所提到的那样,我概述的想法在这种情况下会失败.我没有使用POSIX共享内存的经验,所以我提出的任何建议都应该受到怀疑.

总结(为了我的利益,如果没有其他人的话):

当任何线程在pthread_barrier_wait()返回后获得控制时,barrier对象需要处于'init'状态(但是,该pthread_barrier_init()对象上的最新线程设置它).API还暗示,一旦任何线程返回,就会发生以下一种或多种情况:

  • 另一个调用pthread_barrier_wait()开始新一轮的线程同步
  • pthread_barrier_destroy() 在屏障对象上
  • 如果屏障对象位于共享内存区域,则可以释放或取消共享为屏障对象分配的内存.

这些事情意味着在pthread_barrier_wait()调用允许任何线程返回之前,它几乎需要确保所有等待的线程不再在该调用的上下文中使用barrier对象.我的第一个答案是通过在屏障对象之外创建一个阻止所有线程的"本地"同步对象(互斥和相关条件变量)来解决这个问题.这些本地同步对象分配在pthread_barrier_wait()首先调用的线程的堆栈上.

我认为需要对流程共享的障碍做类似的事情.但是,在这种情况下,简单地在线程的堆栈上分配这些同步对象是不够的(因为其他进程将无法访问).对于进程共享屏障,必须在进程共享内存中分配这些对象.我认为上面列出的技术可以类似地应用:

  • 所述waitdata_mutex控制所述本地同步变量的"分配"(在waitdata块)将是进程共享的存储器已经凭借它在阻挡结构是的.当然,当屏障设置为时THEAD_PROCESS_SHARED,该属性也需要应用于waitdata_mutex
  • __barrier_waitdata_init()调用初始化本地互斥和条件变量时,它必须在共享内存中分配这些对象,而不是简单地使用基于堆栈的waitdata变量.
  • 当'cleanup'线程破坏块中的互斥锁和条件变量时waitdata,它还需要清理块的进程共享内存分配.
  • 在使用共享内存的情况下,需要有一些机制来确保共享内存对象在每个进程中至少打开一次,并在每个进程中关闭正确的次数(但不是在每个进程之前完全关闭)使用它完成了该过程).我还没有想到如何做到这一点......

我认为这些变化将使该计划能够与流程共享障碍一起运作.上面的最后一个要点是要弄清楚的关键项目.另一个是如何为共享内存对象构造一个名称,该名称将保存"本地"进程共享waitdata.您希望该名称具有某些属性:

  • 您希望名称的存储位于struct pthread_barrier_t结构中,以便所有进程都可以访问它; 这意味着名称长度的已知限制
  • 你希望这个名称对于一组调用的每个'实例'都是唯一的,pthread_barrier_wait()因为在所有线程完全从第一轮开始等待之前可能有第二轮等待开始(所以为waitdata可能尚未释放的进程共享内存块设置).所以这个名称可能必须基于进程id,线程id,barrier对象的地址和原子计数器之类的东西.
  • 我不知道这个名字是否"可猜测"是否存在安全隐患.如果是这样,需要添加一些随机化 - 不知道多少.也许你还需要将上面提到的数据与随机位一起散列.就像我说的,我真的不知道这是否重要.