即使使用 User-Agent 标头，如何修复 Python 请求的“403 Forbidden”错误？

Question

即使使用 User-Agent 标头，如何修复 Python 请求的“403 Forbidden”错误？

far*_*att 6 python http-status-code-403 python-requests

我正在向某个 URL 发送请求。我将curl命令复制到python中。因此，所有标头都包含在内，但我的请求不起作用，并且我在 HTML 输出中收到状态代码 403 和错误代码 1020。

代码是

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    # 'Accept-Encoding': 'gzip, deflate, br',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
}

response = requests.get('https://v2.gcchmc.org/book-appointment/', headers=headers)

print(response.status_code)
print(response.cookies.get_dict())
with open("test.html",'w') as f:
    f.write(response.text)

Run Code Online (Sandbox Code Playgroud)

我也收到了 cookie，但没有得到所需的响应。我知道我可以用硒做到这一点，但我想知道这背后的原因。

注意：
我已经安装了所有库并检查了版本，但它仍然无法工作并抛出 403 错误。

Answer 1

Guy*_*Guy 6

该网站受到保护，cloudflare旨在阻止未经授权的数据抓取等行为。来自什么是数据抓取？\n

\n

\n
网页抓取的过程相当简单，但\n实现可能很复杂。网页抓取分 3 个步骤进行：
\n
\n
首先，用于提取信息的代码（我们称之为抓取机器人）向特定网站发送 HTTP GET 请求。
\n
当网站响应时，抓取工具会解析 HTML 文档以获取特定的数据模式。
\n
提取数据后，它将转换为 scraper bot\xe2\x80\x99s 作者设计的任何特定格式。
\n
\n

\n

可以用urllib代替requests，好像可以处理cloudflare

\n

req = urllib.request.Request(\'https://v2.gcchmc.org/book-appointment/\')\nreq.add_headers(\'User-Agent\', \'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0\')\nreq.add_header(\'Accept\', \'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8\')\nreq.add_header(\'Accept-Language\', \'en-US,en;q=0.5\')\n\nr = urllib.request.urlopen(req).read().decode(\'utf-8\')\nwith open("test.html", \'w\', encoding="utf-8") as f:\n    f.write(r)\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	2 年，11 月前
查看次数：	15591 次
最近记录：	2 年，2 月前