Jus*_*ith 5 python file urllib2
首先让我说,我不是编程新手,但对python来说还是新手.
我用urllib2编写了一个程序,它请求一个我想要保存到文件的网页.网页大约300KB,这并没有让我觉得特别大,但似乎足以给我带来麻烦,所以我称之为'大'.我正在使用一个简单的调用直接从返回的对象复制urlopen到文件中:
file.write(webpage.read())
但它只会坐几分钟,试图写入文件,我最终收到以下内容:
Traceback (most recent call last):
File "program.py", line 51, in <module>
main()
File "program.py", line 43, in main
f.write(webpage.read())
File "/usr/lib/python2.7/socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 541, in read
return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 592, in _read_chunked
value.append(self._safe_read(amt))
File "/usr/lib/python2.7/httplib.py", line 649, in _safe_read
raise IncompleteRead(''.join(s), amt)
httplib.IncompleteRead: IncompleteRead(6384 bytes read, 1808 more expected)
Run Code Online (Sandbox Code Playgroud)
我不知道为什么这会让节目如此悲痛?
这是我如何检索页面
jar = cookielib.CookieJar()
cookie_processor = urllib2.HTTPCookieProcessor(jar);
opener = urllib2.build_opener(cookie_processor)
urllib2.install_opener(opener)
requ_login = urllib2.Request(LOGIN_PAGE,
data = urllib.urlencode( { 'destination' : "", 'username' : USERNAME, 'password' : PASSWORD } ))
requ_page = urllib2.Request(WEBPAGE)
try:
#login
urllib2.urlopen(requ_login)
#get desired page
portfolio = urllib2.urlopen(requ_page)
except urllib2.URLError as e:
print e.code, ": ", e.reason
Run Code Online (Sandbox Code Playgroud)
我使用模块提供的方便的fileobject复印机功能shutil.它在我的机器上工作:)
>>> import urllib2
>>> import shutil
>>> remote_fo = urllib2.urlopen('http://docs.python.org/library/shutil.html')
>>> with open('bigfile', 'wb') as local_fo:
... shutil.copyfileobj(remote_fo, local_fo)
...
>>>
Run Code Online (Sandbox Code Playgroud)
更新:您可能希望传递第3个参数来copyfileobj控制用于传输字节的内部缓冲区的大小.
UPDATE2:没什么特别之处shutil.copyfileobj.它只是从源文件对象中读取一大块字节并重复写入目标文件对象,直到没有其他内容可读.这是我从Python标准库中抓取的实际源代码:
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4338 次 |
| 最近记录: |