Ruby - net/http - 重定向

r3n*_*rut 36 ruby curl httpclient http-headers net-http

我有一个URL,我正在使用HTTP GET将查询传递到页面.最新的味道(in net/http)会发生什么,该脚本不会超出302响应.我尝试了几种不同的解决方案; HTTPClient,net/http,Rest-Client,Patron ......

我需要一种方法来继续到最后一页,以验证页面html上的属性标记.重定向是由于移动用户代理点击重定向到移动视图的页面,因此标题中的移动用户代理.这是我今天的代码:

require 'uri'
require 'net/http'

class Check_Get_Page

    def more_http
        url = URI.parse('my_url')
        req, data = Net::HTTP::Get.new(url.path, {
        'User-Agent' => 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_2 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5'
        })
        res = Net::HTTP.start(url.host, url.port) {|http|
        http.request(req)
            }
        cookie = res.response['set-cookie']
        puts 'Body = ' + res.body
        puts 'Message = ' + res.message
        puts 'Code = ' + res.code
        puts "Cookie \n" + cookie
    end

end

m = Check_Get_Page.new
m.more_http
Run Code Online (Sandbox Code Playgroud)

任何建议将不胜感激!

emb*_*oss 60

要关注重定向,您可以执行以下操作(取自ruby-doc)

重定向之后

require 'net/http'
require 'uri'

def fetch(uri_str, limit = 10)
  # You should choose better exception.
  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  url = URI.parse(uri_str)
  req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
  response = Net::HTTP.start(url.host, url.port, use_ssl: true) { |http| http.request(req) }
  case response
  when Net::HTTPSuccess     then response
  when Net::HTTPRedirection then fetch(response['location'], limit - 1)
  else
    response.error!
  end
end

print fetch('http://www.ruby-lang.org/')
Run Code Online (Sandbox Code Playgroud)

  • 这不适用于重定向到自身但添加反斜杠的链接,例如,`fetch('http://epn.dk/okonomi2/dk/ECE5373277/chefoekonom-corydon-skyder-langt-over- mal')`,第一次迭代,它生成`#<Net::HTTPMovedPermanently 301 Moved Permanently readbody=true>`,然后异常... (3认同)
  • 当`response ['Location']`是一个相对路径时,这不起作用,例如:'/ inbox'.在这种情况下,需要设置原始uri的路径,例如:`url.path = response ['Location']`. (3认同)
  • @DavidMoles - 例如,`http://www.puzzledragonx.com/en/monster.asp?n = 9999` - curl显示302重定向与`Location:/`标题,上面的代码模式chokes没有@ MattHuggins建议.或者更确切地说,轻微的调整 - 制作新的`new_uri = URI.parse(response ['Location'])``然后`如果new_uri.relative?`set`new_uri.scheme = uri.scheme'和'new_uri.host = uri .host` - 否则,如果您尝试更新原始路径,则任何查询或片段部分仍将保留原始uri. (2认同)

Pan*_*nic 8

给定一个重定向的 URL

url = 'http://httpbin.org/redirect-to?url=http%3A%2F%2Fhttpbin.org%2Fredirect-to%3Furl%3Dhttp%3A%2F%2Fexample.org'
Run Code Online (Sandbox Code Playgroud)

一种。 Net::HTTP

begin
  response = Net::HTTP.get_response(URI.parse(url))
  url = response['location']
end while response.is_a?(Net::HTTPRedirection)
Run Code Online (Sandbox Code Playgroud)

确保在重定向过多时处理这种情况。

B. OpenURI

open(url).read
Run Code Online (Sandbox Code Playgroud)

OpenURI::OpenRead#open 默认情况下跟随重定向,但它不限制重定向的数量。


sek*_*ett 5

我根据这里给出的例子为此写了另一个课,非常感谢大家.我添加了cookie,参数和异常,最终得到了我需要的东西:https://gist.github.com/sekrett/7dd4177d6c87cf8265cd

require 'uri'
require 'net/http'
require 'openssl'

class UrlResolver
  def self.resolve(uri_str, agent = 'curl/7.43.0', max_attempts = 10, timeout = 10)
    attempts = 0
    cookie = nil

    until attempts >= max_attempts
      attempts += 1

      url = URI.parse(uri_str)
      http = Net::HTTP.new(url.host, url.port)
      http.open_timeout = timeout
      http.read_timeout = timeout
      path = url.path
      path = '/' if path == ''
      path += '?' + url.query unless url.query.nil?

      params = { 'User-Agent' => agent, 'Accept' => '*/*' }
      params['Cookie'] = cookie unless cookie.nil?
      request = Net::HTTP::Get.new(path, params)

      if url.instance_of?(URI::HTTPS)
        http.use_ssl = true
        http.verify_mode = OpenSSL::SSL::VERIFY_NONE
      end
      response = http.request(request)

      case response
        when Net::HTTPSuccess then
          break
        when Net::HTTPRedirection then
          location = response['Location']
          cookie = response['Set-Cookie']
          new_uri = URI.parse(location)
          uri_str = if new_uri.relative?
                      url + location
                    else
                      new_uri.to_s
                    end
        else
          raise 'Unexpected response: ' + response.inspect
      end

    end
    raise 'Too many http redirects' if attempts == max_attempts

    uri_str
    # response.body
  end
end

puts UrlResolver.resolve('http://www.ruby-lang.org')
Run Code Online (Sandbox Code Playgroud)