Joe*_*sed 5 python urllib urllib3 http-status-code-403 python-3.x
嗨,不是每次,但有时当我试图访问 LSE 代码时,我会被抛出每个烦人的 HTTP 错误 403:禁止消息。
任何人都知道我如何仅使用标准 python 模块来解决这个问题(遗憾的是没有漂亮的汤)。
import urllib.request
url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
infile = urllib.request.urlopen(url) # Open the URL
data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1
print(data) # Print the data to the screen
Run Code Online (Sandbox Code Playgroud)
但是,时不时地这是我显示的错误:
Traceback (most recent call last):
File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in <module>
webpage = urlopen(req).read().decode('ISO-8859-1')
File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 469, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 507, in error
return self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Process exited with code: 1
Run Code Online (Sandbox Code Playgroud)
链接到所有正常模块的列表:https : //docs.python.org/3.4/py-modindex.html
提前谢谢了。
Kar*_*omo 12
这可能是由于mod_security。您需要通过将 URL 作为浏览器而不是python urllib打开来进行欺骗。
在这里,我更正了您的代码:
import urllib.request
url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
# Open the URL as Browser, not as python urllib
page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})
infile=urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1
print(data) # Print the data to the screen
Run Code Online (Sandbox Code Playgroud)
接下来,您可以使用BeautifulSoup来抓取 HTML。
小智 1
您似乎受到了速率限制。尝试睡一觉然后重试。例如:
import urllib
import urllib.request
from time import sleep
LSE_URL = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
WAIT_PERIOD = 15
def stock_data_reader():
stock_data = get_stock_data()
while True:
if not stock_data:
sleep(WAIT_PERIOD) # sleep for a while until next retry
stock_data = get_stock_data()
else:
break
print(stock_data) # do something with stock data
def get_stock_data():
try:
infile = urllib.request.urlopen(LSE_URL) # Open the URL
except urllib.error.HTTPError as http_err:
print("Error: %s" % http_err)
return None
else:
data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1
return data
stock_data_reader()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
12384 次 |
| 最近记录: |