Python 从网络标头请求 403 Forbidden Referer

Question

Python 从网络标头请求 403 Forbidden Referer

use*_*706 2 web-scraping http-status-code-403 python-requests

这个请求曾经有效，但现在得到了 403。我尝试添加一个像这个答案一样的用户代理，但仍然不好：https ://stackoverflow.com/a/38489588/2415706

第二个答案进一步说要找到引用标头，但我无法弄清楚这些响应标头在哪里：https ://stackoverflow.com/a/56946001/2415706

import requests
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
"referer": "https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State"
job_url = "https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State"
job_response = requests.get(job_url,  headers=headers, timeout=10)
print(job_response)

Run Code Online (Sandbox Code Playgroud)

这是刷新页面后我在第一个选项卡的“请求标头”下看到的内容，但内容太多。我想我只需要其中一行。

:authority: www.ziprecruiter.com
:method: GET
:path: /Salaries/What-Is-the-Average-Programmer-Salary-by-State
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: max-age=0
cookie: __cfduid=dea4372c39465cfa2422e97f84dea45fb1620355067; zva=100000000%3Bvid%3AYJSn-w3tCu9yJwJx; ziprecruiter_browser=99.31.211.77_1620355067_495865399; SAFESAVE_TOKEN=1a7e5e90-60de-494d-9af5-6efdab7ade45; zglobalid=b96f3b99-1bed-4b7c-a36f-37f2d16c99f4.62fd155f2bee.6094a7fb; ziprecruiter_session=66052203cea2bf6afa7e45cae7d1b0fe; experian_campaign_visited=1
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"
sec-ch-ua-mobile: ?0
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36

Run Code Online (Sandbox Code Playgroud)

编辑：看看其他选项卡，他们有引荐来源：“referer”：“https://www.ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State”所以我正在尝试现在还是403。

Answer 1

Ber*_*tel 6

使用httpx 包似乎可以使用：

import httpx

url = 'https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State'

r = httpx.get(url)

print(r.text)
print(r.status_code)
print(r.http_version)

Run Code Online (Sandbox Code Playgroud)

repl.it：https://replit.com/@bertrandmartel/ZipRecruiter

我可能是错的，但我认为服务器不喜欢请求库的 TLS 协商。这很奇怪，因为上面的调用在请求中使用HTTP1.1，而对于curl，它只适用于http2和TLS1.3

使用使用http2和支持TLS1.3的openssl构建的curl二进制文件，可以进行以下工作：

docker run --rm curlimages/curl:7.76.1 \
    --http2 --tlsv1.3 'https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State' \
    -H 'user-agent: Mozilla' \
    -s -o /dev/null -w "%{http_code}"

Run Code Online (Sandbox Code Playgroud)

返回：

Run Code Online (Sandbox Code Playgroud)

以下命令失败：

强制使用 http1.1 并强制使用 TLS 1.3

Run Code Online (Sandbox Code Playgroud)

输出：403

强制使用 http2 并强制执行 TLS 1.2：

docker run --rm curlimages/curl:7.76.1 \
    --http1.1 --tlsv1.3 'https://ziprecruiter.com/Salaries/What-Is-the-Average-Programmer-Salary-by-State' \
    -H 'user-agent: Mozilla' \
    -s -o /dev/null -w "%{http_code}"

Run Code Online (Sandbox Code Playgroud)

输出：403

我的猜测是它在 TLS 协商中检测到某些内容，但当同时存在 TLS1.3 和 HTTP/2 时检查会有所不同

不幸的是，您无法使用 requests/urlib 检查 http/2，因为它不受支持

更有可能的是，cloudflare 还没有看到足够的 httpx 流量来识别/阻止它。 (3认同)

归档时间：	4 年，5 月前
查看次数：	849 次
最近记录：	4 年，3 月前