Nat*_*ong 8 ruby multithreading deadlock fork signals
我正在阅读Jesse Storimer的优秀着作"使用Unix进程".在一个关于从已经退出的子进程捕获信号的部分中,他给出了一个代码示例.
我稍微修改了这段代码(见下文),以便更清楚地了解正在发生的事情:
puts),wait在一个多子女执行trap的语句(有时我得到"接到CHLD信号"一次,然后由多个"子PID退出").通常,下面代码的输出类似于:
parent is working hard
Received a CHLD signal
child pid 73408 exited
parent is working hard
parent is working hard
parent is working hard
Received a CHLD signal
child pid 73410 exited
child pid 73409 exited
All children exited - parent exiting too.
Run Code Online (Sandbox Code Playgroud)
但有一段时间我得到这样的错误:
trapping_signals.rb:17:in `write': deadlock; recursive locking (ThreadError)
from trapping_signals.rb:17:in `puts'
from trapping_signals.rb:17:in `puts'
from trapping_signals.rb:17:in `block in <main>'
from trapping_signals.rb:17:in `call'
from trapping_signals.rb:17:in `write'
from trapping_signals.rb:17:in `puts'
from trapping_signals.rb:17:in `puts'
from trapping_signals.rb:17:in `block in <main>'
from trapping_signals.rb:40:in `call'
from trapping_signals.rb:40:in `sleep'
from trapping_signals.rb:40:in `block in <main>'
from trapping_signals.rb:38:in `loop'
from trapping_signals.rb:38:in `<main>
Run Code Online (Sandbox Code Playgroud)
任何人都可以向我解释这里出了什么问题吗?
child_processes = 3
dead_processes = 0
# We fork 3 child processes.
child_processes.times do
fork do
# Each sleeps between 0 and 5 seconds
sleep rand(5)
end
end
# Our parent process will be busy doing some work.
# But still wants to know when one of its children exits.
# By trapping the :CHLD signal our process will be notified by the kernel
# when one of its children exits.
trap(:CHLD) do
puts "Received a CHLD signal"
# Since Process.wait queues up any data that it has for us we can ask for it
# here, since we know that one of our child processes has exited.
# We loop over a non-blocking Process.wait to ensure that any dead child
# processes are accounted for.
# Here we wait without blocking.
while pid = Process.wait(-1, Process::WNOHANG)
puts "child pid #{pid} exited"
dead_processes += 1
# We exit ourselves once all the child processes are accounted for.
if dead_processes == child_processes
puts "All children exited - parent exiting too."
exit
end
end
end
# Work it.
loop do
puts "parent is working hard"
sleep 1
end
Run Code Online (Sandbox Code Playgroud)
emb*_*oss 13
我查看了Ruby源代码以查看引发该特定错误的位置,并且只在当前线程尝试获取锁定时才会引发该错误,但当前线程已经采用了相同的锁定.这意味着锁定不是可重入的:
m = Mutex.new
m.lock
m.lock #=> same error as yours
Run Code Online (Sandbox Code Playgroud)
现在至少我们知道会发生什么,但不知道为什么以及在哪里.错误消息表明它在调用期间发生puts.当它被调用时,它最终以io_binwrite结束.stdout不是同步的,但它是缓冲的,所以在第一次调用时满足条件,并且将设置缓冲区加上该缓冲区的写锁定.写锁定对于保证写入stdout的原子性很重要,不应该发生两个线程同时写入stdout混合彼此的输出.为了证明我的意思:
t1 = Thread.new { 100.times { print "aaaaa" } }
t2 = Thread.new { 100.times { print "bbbbb" } }
t1.join
t2.join
Run Code Online (Sandbox Code Playgroud)
尽管两个线程都是在写入时轮流进行stdout,但是单个写入被打破绝不会发生 - 您将始终按顺序排列完整的5个或b个.这就是写锁的用途.
现在出现问题的是写锁定的竞争条件.父进程循环并写入stdout每秒(" 父进程正在努力").但是同一个线程最终也会执行该trap块并再次尝试写入stdout("接收CHLD信号").您可以验证它确实是同一个线程加入#{Thread.current}您的puts语句.如果这两个事件发生得足够紧密,那么你将遇到与第一个例子相同的情况:同一个线程试图获得两次相同的锁,这最终会触发错误.
| 归档时间: |
|
| 查看次数: |
1309 次 |
| 最近记录: |