我试图废弃一个网站进行练习,但我继续得到HTTP错误403(它认为我是一个机器人)?
这是我的代码:
#import requests
import urllib.request
from bs4 import BeautifulSoup
#from urllib import urlopen
import re
webpage = urllib.request.urlopen('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1').read
findrows = re.compile('<tr class="- banding(?:On|Off)>(.*?)</tr>')
findlink = re.compile('<a href =">(.*)</a>')
row_array = re.findall(findrows, webpage)
links = re.finall(findlink, webpate)
print(len(row_array))
iterator = []
Run Code Online (Sandbox Code Playgroud)
我得到的错误是:
File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 479, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 591, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 517, in error …Run Code Online (Sandbox Code Playgroud) 尝试urlopen维基百科的某个页面时,我有一个奇怪的错误.这是页面:
http://en.wikipedia.org/wiki/OpenCola_(drink)
这是shell会话:
>>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
Traceback (most recent call last):
File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "c:\Python26\Lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "c:\Python26\Lib\urllib2.py", line 397, in open
response = meth(req, response)
File "c:\Python26\Lib\urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "c:\Python26\Lib\urllib2.py", line 435, in error
return self._call_chain(*args)
File "c:\Python26\Lib\urllib2.py", line 369, in _call_chain
result = …Run Code Online (Sandbox Code Playgroud) 我有一个用于测试的服务器设置,带有自签名证书,并希望能够对其进行测试.
你如何忽略Python 3版本中的SSL验证urlopen?
我发现的关于此的所有信息都与urllib2Python 2 有关.
urllib在python 3中已从urllib2:
Python 2,urllib2:urllib2.urlopen(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]])
https://docs.python.org/2/library/urllib2.html#urllib2.urlopen
Python 3:https : urllib.request.urlopen(url[, data][, timeout])
//docs.python.org/3.0/library/urllib.request.html?highlight=urllib#urllib.request.urlopen
所以我知道这可以通过以下方式在Python 2中完成.但是Python 3 urlopen缺少context参数.
import urllib2
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
urllib2.urlopen("https://your-test-server.local", context=ctx)
Run Code Online (Sandbox Code Playgroud)
是的,我知道这是一个坏主意.这仅适用于在私有服务器上进行测试.
我无法找到如何在Python 3文档或任何其他问题中完成此操作.即使是明确提到Python 3的人,仍然有urllib2/Python 2的解决方案.
import requests
import webbrowser
from bs4 import BeautifulSoup
url = 'https://www.gamefaqs.com'
#headers={'User-Agent': 'Mozilla/5.0'}
headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = requests.get(url, headers)
Run Code Online (Sandbox Code Playgroud)
response.status_code 返回403。我可以使用firefox/chrome浏览网站,所以这似乎是一个编码错误。
我无法弄清楚我犯了什么错误。
谢谢你。