Ric*_*ano 8 python urllib python-3.x
这个简单的Python 3脚本:
import urllib.request
host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve(url, filename)
Run Code Online (Sandbox Code Playgroud)
提出这个例外:
Traceback (most recent call last):
File "C:\Users\ricardo\Desktop\Google-Scholar\BibTex\test2.py", line 8, in <module>
urllib.request.urlretrieve(url, filename)
File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python32\lib\urllib\request.py", line 1597, in retrieve
block = fp.read(bs)
ValueError: read of closed file
Run Code Online (Sandbox Code Playgroud)
我认为这可能是一个暂时的问题,所以我添加了一些简单的异常处理,如下所示:
import random
import time
import urllib.request
host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
while True:
try:
print("Downloading...")
time.sleep(random.randint(0, 5))
urllib.request.urlretrieve(url, filename)
break
except ValueError:
pass
Run Code Online (Sandbox Code Playgroud)
但这只是Downloading...无限打印.
您的URL返回403代码错误,显然urllib.request.urlretrieve不擅长检测所有HTTP错误,因为它正在使用urllib.request.FancyURLopener和最新尝试通过返回urlinfo而不是引发错误来吞下错误.
关于修复,如果您仍想使用urlretrieve,可以像这样覆盖FancyURLopener(包含的代码也显示错误):
import urllib.request
from urllib.request import FancyURLopener
class FixFancyURLOpener(FancyURLopener):
def http_error_default(self, url, fp, errcode, errmsg, headers):
if errcode == 403:
raise ValueError("403")
return super(FixFancyURLOpener, self).http_error_default(
url, fp, errcode, errmsg, headers
)
# Monkey Patch
urllib.request.FancyURLopener = FixFancyURLOpener
url = "http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
urllib.request.urlretrieve(url, "cite0.bib")
Run Code Online (Sandbox Code Playgroud)
另外,这是我建议你可以这样使用urllib.request.urlopen:
fp = urllib.request.urlopen('http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0')
with open("citi0.bib", "w") as fo:
fo.write(fp.read())
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8922 次 |
| 最近记录: |