Nic*_* A. 10 ruby parallel-processing asynchronous httprequest
我有一系列的URL,我不想打开每个URL并获取一个特定的标签.
但我希望同时做到这一点.
这是我想要做的伪代码:
urls = [...]
tags = []
urls.each do |url|
fetch_tag_asynchronously(url) do |tag|
tags << tag
end
end
wait_for_all_requests_to_finish()
如果这可以以一种非常好的安全方式完成,那将是非常棒的.
我可以使用线程,但它看起来不像数组在ruby中是线程安全的.
Nik*_* B. 35
您可以使用以下命令实现线程安全Mutex:
require 'thread' # for Mutex
urls = %w(
http://test1.example.org/
http://test2.example.org/
...
)
threads = []
tags = []
tags_mutex = Mutex.new
urls.each do |url|
threads << Thread.new(url, tags) do |url, tags|
tag = fetch_tag(url)
tags_mutex.synchronize { tags << tag }
end
end
threads.each(&:join)
Run Code Online (Sandbox Code Playgroud)
但是,为每个URL使用一个新线程可能会适得其反,因此限制这样的线程数可能会更高效:
THREAD_COUNT = 8 # tweak this number for maximum performance.
tags = []
mutex = Mutex.new
THREAD_COUNT.times.map {
Thread.new(urls, tags) do |urls, tags|
while url = mutex.synchronize { urls.pop }
tag = fetch_tag(url)
mutex.synchronize { tags << tag }
end
end
}.each(&:join)
Run Code Online (Sandbox Code Playgroud)