尝试连接到 .onion 站点时 Python 请求失败

Jon*_*nes 2 python-3.x python-requests

我正在尝试获取托管在 Tor 网络中的网页。我正在使用以下代码:

import requests

def get_tor_session():
    session = requests.session()
    session.proxies = {'http':  'socks5://127.0.0.1:9150',
                       'https': 'socks5://127.0.0.1:9150'}
    return session

session = get_tor_session()
Run Code Online (Sandbox Code Playgroud)

当我尝试访问普通网站时,它运行良好,例如:print(session.get("http://httpbin.org/ip").text)prints{"origin": "80.67.172.162"}

但是当我在 .onion 站点上尝试时,它失败并显示以下错误:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/socks.py", line 813, in connect
    negotiate(self, dest_addr, dest_port)
  File "/usr/local/lib/python3.6/site-packages/socks.py", line 477, in _negotiate_SOCKS5
    CONNECT, dest_addr)
  File "/usr/local/lib/python3.6/site-packages/socks.py", line 540, in _SOCKS5_request
    resolved = self._write_SOCKS5_address(dst, writer)
  File "/usr/local/lib/python3.6/site-packages/socks.py", line 592, in _write_SOCKS5_address
    addresses = socket.getaddrinfo(host, port, socket.AF_UNSPEC, socket.SOCK_STREAM, socket.IPPROTO_TCP, socket.AI_ADDRCONFIG)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:
Run Code Online (Sandbox Code Playgroud)

...

Traceback (most recent call last):
  File "spider.py", line 13, in <module>
    print(session.get("http://zqktlwi4fecvo6ri.onion/").text)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: SOCKSHTTPConnectionPool(host='zqktlwi4fecvo6ri.onion', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.contri
b.socks.SOCKSConnection object at 0x106fd62e8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
Run Code Online (Sandbox Code Playgroud)

t.m*_*dam 6

使用该socks5方案时,域由客户端的 DNS 服务器在本地解析。但是“普通”DNS 服务器无法解析 .onion 域,因此您的请求失败。

来自docs.python-requests.org

使用该方案socks5会导致 DNS 解析发生在客户端上,而不是发生在代理服务器上。这与 curl 是一致的,它使用该方案来决定是在客户端还是代理上进行 DNS 解析。如果要解析代理服务器上的域,请使用socks5h作为方案。

因此,为了连接到 .onion 站点,您应该让 TOR 解析域。如果您使用socks5h代理字典中的sheme,这是可能的。

import requests

session = requests.session()
session.proxies = {'http': 'socks5h://127.0.0.1:9150', 'https': 'socks5h://127.0.0.1:9150'}
response = session.get("https://3g2upl4pq6kufc4m.onion/")
print(response)
#<Response [200]>
Run Code Online (Sandbox Code Playgroud)

请注意,您可能需要安装额外的依赖项。

pip install requests[socks]
Run Code Online (Sandbox Code Playgroud)