Nic*_*hao 5 openssl beautifulsoup python-2.7 python-requests
我正在抓取这个 aspx 网站 https://gra206.aca.ntu.edu.tw/Temp/W2.aspx?Type=2。
As it required, I have to parse in __VIEWSTATE and __EVENTVALIDATION while sending a post request. Now I am trying to send a get request first to have those two values, and then parse then afterward.
However, I have tried several times to send a get request. It always turns out throwing this error message:
requests.exceptions.SSLError: HTTPSConnectionPool(host='gra206.aca.ntu.edu.tw', port=443): Max retries exceeded with url: /Temp/W2.aspx?Type=2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
I have tried:
However, none of them works.
I am currently using:
env:
python 2.7
bs4 4.6.0
request 2.18.4
openssl 1.0.2n
Run Code Online (Sandbox Code Playgroud)
Here is my code:
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
url = 'https://gra206.aca.ntu.edu.tw/Temp/W2.aspx?Type=2'
r = s.get(url, headers={'x-test2': 'true'})
soup = BeautifulSoup(r.content, 'lxml')
viewstate = soup.find('input', {'id': '__VIEWSTATE' })['value']
validation = soup.find('input', {'id': '__EVENTVALIDATION' })['value']
print viewstate, generator, validation
Run Code Online (Sandbox Code Playgroud)
小智 3
我也在寻找解决方案。一些站点已弃用 TLSv1.0,并且 Requests + Openssl(在 Windows 7 上)无法与此类对等主机建立握手。Wireshark 日志显示客户端发出了 TLSv1 Client Hello,但主机未正确应答。此错误随着错误消息 Requests 显示而向上传播。即使使用最新的 Openssl/pyOpenssl/Requests 并在 Py3.6/2.7.12 上尝试,也没有运气。有趣的是,当我将网址替换为“google.com”等其他网址时,日志显示主机发出并响应了 TLSv1.2 Hello。请检查图像tlsv1和 tlsv1.2。显然客户端具有 TLSv1.2 功能,但为什么在前一种情况下使用 v1.0 Hello 呢?
[编辑] 我之前的陈述是错误的。Wireshark 将未完成的 TLSv1.2 HELLO 交换解释为 TLSv1。经过更多深入研究后,我发现这些主机期待纯 TLSv1,而不是 TLSv1.2 的 TLSv1 回退。由于与 Chrome 的日志相比,Openssl 在 Hello 扩展字段(可能是支持的版本)中缺少一些字段。我找到了一个解决方法。1. 强制使用TLSv1协商。2. 将默认密码套件更改为 py3.4 样式以重新启用 3DES。
import ssl
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
#from urllib3.poolmanager import PoolManager
from requests.packages.urllib3.util.ssl_ import create_urllib3_context
# py3.4 default
CIPHERS = (
'ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+HIGH:'
'DH+HIGH:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+HIGH:RSA+3DES:!aNULL:'
'!eNULL:!MD5'
)
class DESAdapter(HTTPAdapter):
"""
A TransportAdapter that re-enables 3DES support in Requests.
"""
def create_ssl_context(self):
#ctx = create_urllib3_context(ciphers=FORCED_CIPHERS)
ctx = ssl.create_default_context()
# allow TLS 1.0 and TLS 1.2 and later (disable SSLv3 and SSLv2)
#ctx.options |= ssl.OP_NO_SSLv2
#ctx.options |= ssl.OP_NO_SSLv3
#ctx.options |= ssl.OP_NO_TLSv1
ctx.options |= ssl.OP_NO_TLSv1_2
ctx.options |= ssl.OP_NO_TLSv1_1
#ctx.options |= ssl.OP_NO_TLSv1_3
ctx.set_ciphers( CIPHERS )
#ctx.set_alpn_protocols(['http/1.1', 'spdy/2'])
return ctx
def init_poolmanager(self, *args, **kwargs):
context = create_urllib3_context(ciphers=CIPHERS)
kwargs['ssl_context'] = self.create_ssl_context()
return super(DESAdapter, self).init_poolmanager(*args, **kwargs)
def proxy_manager_for(self, *args, **kwargs):
context = create_urllib3_context(ciphers=CIPHERS)
kwargs['ssl_context'] = self.create_ssl_context()
return super(DESAdapter, self).proxy_manager_for(*args, **kwargs)
tmoval=10
proxies={}
hdr = {'Accept-Language':'zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4', 'Cache-Control':'max-age=0', 'Connection':'keep-alive', 'Proxy-Connection':'keep-alive', #'Cache-Control':'no-cache', 'Connection':'close',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36',
'Accept-Encoding':'gzip,deflate,sdch','Accept':'*/*'}
ses = requests.session()
ses.mount(url, DESAdapter())
response = ses.get(url, timeout=tmoval, headers = hdr, proxies=proxies)
Run Code Online (Sandbox Code Playgroud)
[EDIT2] 当您的 HTTPS url 包含任何大写字母时,该补丁将无法工作。您需要将它们反转为小写。requests/urllib3/openssl 堆栈中的未知内容导致补丁逻辑恢复为其默认的 TLS1.2 方式。
[编辑3] 来自http://docs.python-requests.org/en/master/user/advanced/
mount 调用将传输适配器的特定实例注册到前缀。安装后,使用 URL 以给定前缀开头的会话发出的任何 HTTP 请求都将使用给定的传输适配器。
因此,要使所有 HTTPS 请求包含服务器随后重定向的请求以使用新适配器,必须将此行更改为:
ses.mount('https://', DESAdapter())
Run Code Online (Sandbox Code Playgroud)
它以某种方式解决了上面提到的大写问题。
| 归档时间: |
|
| 查看次数: |
13086 次 |
| 最近记录: |