如何在远程(HTTP)文件上寻找特定位置,以便我只能下载该部分?
让我们说远程文件上的字节是:1234567890
我想寻求4并从那里下载3个字节,所以我会:456
另外,如何检查远程文件是否存在?我试过,os.path.isfile()但是当我传递一个远程文件url时它返回False.
jbo*_*chi 16
如果要通过HTTP下载远程文件,则需要设置Range
标头.
在这个例子中检查它是如何完成的.看起来像这样:
myUrlclass.addheader("Range","bytes=%s-" % (existSize))
Run Code Online (Sandbox Code Playgroud)
编辑:我刚刚找到了更好的实现.这个类使用起来非常简单,因为它可以在docstring中看到.
class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.
This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the
"206 Partial Content" reponse from the HTTP server is what we
expected.
Example:
import urllib2
import byterange
range_handler = range.HTTPRangeHandler()
opener = urllib2.build_opener(range_handler)
# install it
urllib2.install_opener(opener)
# create Request and set Range header
req = urllib2.Request('http://www.python.org/')
req.header['Range'] = 'bytes=30-50'
f = urllib2.urlopen(req)
"""
def http_error_206(self, req, fp, code, msg, hdrs):
# 206 Partial Content Response
r = urllib.addinfourl(fp, hdrs, req.get_full_url())
r.code = code
r.msg = msg
return r
def http_error_416(self, req, fp, code, msg, hdrs):
# HTTP's Range Not Satisfiable error
raise RangeError('Requested Range Not Satisfiable')
Run Code Online (Sandbox Code Playgroud)
更新:"更好的实现"已移至byterange.py文件中的github:excid3/urlgrabber.
我强烈建议使用请求库.它是我用过的最好的HTTP库.特别是,要完成你所描述的内容,你会做类似的事情:
import requests
url = "http://www.sffaudio.com/podcasts/ShellGameByPhilipK.Dick.pdf"
# Retrieve bytes between offsets 3 and 5 (inclusive).
r = requests.get(url, headers={"range": "bytes=3-5"})
# If a 4XX client error or a 5XX server error is encountered, we raise it.
r.raise_for_status()
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
6547 次 |
最近记录: |