即使异步I/O操作处于挂起状态,也只有线程处理io_service正在等待

Dav*_*rtz 6 linux multithreading boost-asio

Boost的ASIO调度员似乎有一个严重的问题,我似乎无法找到解决方法.问题是,等待分派的唯一线程pthread_cond_wait仍然存在,尽管有待处理的I/O操作需要阻塞epoll_wait.

我可以通过poll_one在循环中调用一个线程直到它返回零来最容易地复制此问题.这可能会使线程调用run陷入困境,pthread_cond_wait而线程调用poll_one会从循环中断开.据推测,io_service期望该线程返回阻止epoll_wait,但它没有义务这样做,并且这种期望似乎是致命的.

是否要求线程与io_services 静态关联?

这是一个显示死锁的示例.这是处理此io_service的唯一线程,因为其他人已经继续.肯定有套接字操作待定:

#0 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (...) at /usr/include/boost/asio/detail/posix_event.hpp:80
#2 boost::asio::detail::task_io_service::do_run_one (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:405
#3 boost::asio::detail::task_io_service::run (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:146
Run Code Online (Sandbox Code Playgroud)

我相信错误如下:如果服务于I/O队列的线程是阻塞I/O套接字就绪检查并且调用调度函数的线程,如果在io服务上阻塞了任何其他线程,它必须发出信号.它目前仅表示当时是否有准备好运行的处理程序.但是没有线程检查套接字准备情况.

Tan*_*ury 6

这是一个错误.我已经能够通过在非关键部分添加延迟来复制它task_io_service::do_poll_one.下面是修改后的片段task_io_service::do_poll_one()booost/asio/detail/impl/task_io_service.ipp.唯一增加的是睡眠.

std::size_t task_io_service::do_poll_one(mutex::scoped_lock& lock,
    task_io_service::thread_info& this_thread,
    const boost::system::error_code& ec)
{
  if (stopped_)
    return 0;

  operation* o = op_queue_.front();
  if (o == &task_operation_)
  {
    op_queue_.pop();
    lock.unlock();

    {
      task_cleanup c = { this, &lock, &this_thread };
      (void)c;

      // Run the task. May throw an exception. Only block if the operation
      // queue is empty and we're not polling, otherwise we want to return
      // as soon as possible.
      task_->run(false, this_thread.private_op_queue);
      boost::this_thread::sleep_for(boost::chrono::seconds(3));
    }

    o = op_queue_.front();
    if (o == &task_operation_)
      return 0;
  }

...
Run Code Online (Sandbox Code Playgroud)

我的测试驱动程序非常基础:

  • 通过计时器进行异步工作循环,打印"." 每3秒钟一次.
  • 产生一个将轮询的单个线程io_service.
  • 延迟允许新线程时间轮询io_service,并io_service::run()在轮询线程休眠时进行主调用task_io_service::do_poll_one().

测试代码:

#include <iostream>

#include <boost/asio/io_service.hpp>
#include <boost/asio/steady_timer.hpp>
#include <boost/chrono.hpp>
#include <boost/thread.hpp>

boost::asio::io_service io_service;
boost::asio::steady_timer timer(io_service);

void arm_timer()
{
  std::cout << ".";
  std::cout.flush();
  timer.expires_from_now(boost::chrono::seconds(3));
  timer.async_wait(boost::bind(&arm_timer));
}

int main()
{
  // Add asynchronous work loop.
  arm_timer();

  // Spawn poll thread.
  boost::thread poll_thread(
    boost::bind(&boost::asio::io_service::poll, boost::ref(io_service)));

  // Give time for poll thread service reactor.
  boost::this_thread::sleep_for(boost::chrono::seconds(1));

  io_service.run();
}
Run Code Online (Sandbox Code Playgroud)

调试:

[twsansbury@localhost bug]$ gdb a.out 
...
(gdb) r
Starting program: /home/twsansbury/dev/bug/a.out 

[Thread debugging using libthread_db enabled]
.[New Thread 0xb7feeb90 (LWP 31892)]
[Thread 0xb7feeb90 (LWP 31892) exited]

此时,arm_timer()已打印"." 曾经(当它被武装起来时).poll线程以非阻塞方式为反应堆提供服务,并且op_queue_在空task_operation_op_queue_时候睡了3秒(将被添加回task_cleanup c退出时的范围).当它op_queue_是空的时,主线程调用io_service::run(),看到它op_queue_是空的,并使它自己first_idle_thread_,它等待它的位置wakeup_event.poll线程完成休眠,然后返回0,主线程等待wakeup_event.

等待10秒后,有足够的时间arm_timer()准备好,我打断调试器:

Program received signal SIGINT, Interrupt.
0x00919402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00919402 in __kernel_vsyscall ()
#1  0x0081bbc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x00763b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x08059dc2 in void boost::asio::detail::posix_event::wait >(boost::asio::detail::scoped_lock&) ()
#4  0x0805a009 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) ()
#5  0x0805a11c in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()
#6  0x0805a1e2 in boost::asio::io_service::run() ()
#7  0x0804db78 in main ()

并排时间表如下:

          poll thread                  |          main thread
---------------------------------------+---------------------------------------
  lock()                               | 
  do_poll_one()                        |                          
  |-- pop task_operation_ from         |
  |   queue_op_                        |
  |-- unlock()                         |  lock()
  |-- create task_cleanup              |  do_run_one()
  |-- service reactor (non-block)      |  `-- queue_op_ is empty
  |-- ~task_cleanup()                  |      |-- set thread as idle
  |   |-- lock()                       |      `-- unlock()
  |   `-- queue_op_.push(              |
  |       task_operation_)             |
  `-- task_operation_ is               | 
      queue_op_.front()                |
      `-- return 0                     |  // still waiting on wakeup_event
  unlock()                             |

尽我所知,修补没有副作用:

if (o == &task_operation_)
  return 0;
Run Code Online (Sandbox Code Playgroud)

至:

if (o == &task_operation_)
{
  if (!one_thread_)
    wake_one_thread_and_unlock(lock);
  return 0;
}
Run Code Online (Sandbox Code Playgroud)

无论如何,我已经提交了一个错误并修复了.考虑一下官方回复的票据.