用于大型XML下载的快速ruby http库

Vla*_*anu 3 ruby ruby-on-rails http download

我正在使用各种XML-over-HTTP Web服务返回大型XML文件(> 2MB).什么是最快的ruby http库来减少'下载'时间?

所需功能:

  • GET和POST请求

  • gzip/deflate downloads(Accept-Encoding: deflate, gzip) - 非常重要

我在考虑:

  • 开放式的URI

  • 网:: HTTP

  • 抑制

但你也可以提出其他建议.

PS要解析响应,我使用Nokogiri的pull解析器,所以我不需要像rest-client或hpricot这样的集成解决方案.

The*_*heo 17

您可以使用EventMachineem-http来传输XML:

require 'rubygems'
require 'eventmachine'
require 'em-http'
require 'nokogiri'

# this is your SAX handler, I'm not very familiar with
# Nokogiri, so I just took an exaple from the RDoc
class SteamingDocument < Nokogiri::XML::SAX::Document
  def start_element(name, attrs=[])
    puts "starting: #{name}"
  end

  def end_element(name)
    puts "ending: #{name}"
  end
end

document = SteamingDocument.new
url = 'http://stackoverflow.com/feeds/question/2833829'

# run the EventMachine reactor, this call will block until 
# EventMachine.stop is called
EventMachine.run do
  # Nokogiri wants an IO to read from, so create a pipe that it
  # can read from, and we can write to
  io_read, io_write = IO.pipe

  # run the parser in its own thread so that it can block while
  # reading from the pipe
  EventMachine.defer(proc {
    parser = Nokogiri::XML::SAX::Parser.new(document)
    parser.parse_io(io_read)
  })

  # use em-http to stream the XML document, feeding the pipe with
  # each chunk as it becomes available
  http = EventMachine::HttpRequest.new(url).get
  http.stream { |chunk| io_write << chunk }

  # when the HTTP request is done, stop EventMachine
  http.callback { EventMachine.stop }
end
Run Code Online (Sandbox Code Playgroud)

它可能有点低级,但可能是任何文档大小的最高性能选项.喂它数百兆,它不会填满你的记忆,因为任何非流媒体解决方案(只要你没有保留你正在加载的大部分文件,但这是你的一面).