Hen*_*hiu 9 ruby ruby-on-rails
我正在使用Net :: HTTP和Ruby来抓取URL.
我不想抓取流媒体音频,例如:http://listen2.openstream.co/334
实际上我只想抓取Html内容,所以没有pdfs,video,txt ..
现在,我将open_timeout和read_timeout都设置为10,所以即使我抓取这些流式音频页面,它们也会超时.
url = 'http://listen2.openstream.co/334'
path = uri.path
req= Net::HTTP::Get.new(path, {'Accept' => '*/*', 'Content-Type' => 'text/plain; charset=utf-8', 'Connection' => 'keep-alive','Accept-Encoding' => 'Identity'})
uri = Addressable::URI.parse(url)
resp = Net::HTTP.start(uri.host, uri.inferred_port) do |httpRequest|
httpRequest.open_timeout = 10
httpRequest.read_timeout = 10
#how can I read the headers here before it's streaming the body and then exit b/c the content type is audio?
httpRequest.request(req)
end
Run Code Online (Sandbox Code Playgroud)
但是,有没有办法检查标题之前我读取http响应的正文,看看它是否是一个音频?我想这样做而不发送单独的HEAD请求.
net/http
支持流媒体,你可以使用它来读取正文之前的标题.
代码示例,
url = URI('http://stackoverflow.com/questions/41306082/ruby-nethttp-read-the-header-before-the-body-without-head-request')
Net::HTTP.start(url.host, url.port) do |http|
request = Net::HTTP::Get.new(url)
http.request(request) do |response|
# check headers here, body has not yet been read
# then call read_body or just body to read the body
if true
response.read_body do |chunk|
# process body chunks here
end
end
end
end
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1083 次 |
最近记录: |