Python Request 得到 403 响应,但 Curl 得到 200 响应,这是怎么回事?

sup*_*dee 6 python curl http-status-code-403 python-requests

我从 python 和curl 得到了不同的响应,尽管它们都使用完全相同的参数。

Python:

import requests

headers = {
    'Accept-Language': 'en-US,en',
    'Accept': 'text/html,application/xhtml+xml,application/xml',
    'Authority': 'www.google.com',
    'User-Agent': 'SomeAgent',
    'Upgrade-Insecure-Requests': '1',
}

response = requests.get('https://www.avvo.com', headers=headers)
# Returns a 403 response
Run Code Online (Sandbox Code Playgroud)

卷曲:

import shlex, subprocess
cmd = '''curl -H 'Accept-Language: en-US,en' -H 'Accept: text/html,application/xhtml+xml,application/xml' -H 'Authority: www.google.com' -H 'User-Agent: SomeAgent' -H 'Upgrade-Insecure-Requests: 1' https://www.avvo.com'''
args = shlex.split(cmd)
process = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
# Returns a 200 response
Run Code Online (Sandbox Code Playgroud)

两个请求均从同一 IP 发送。看起来这是一个 cloudflare 问题,cloudflare 有什么方法可以检测来自 python requests 库的请求与直接的 curl 命令吗?

我将网站留在了代码中,以防其运行有用。这里直接使用curl命令:

curl -H 'Accept-Language: en-US,en' -H 'Accept: text/html,application/xhtml+xml,application/xml' -H 'Authority: www.google.com' -H 'User-Agent: SomeAgent' -H 'Upgrade-Insecure-Requests: 1' https://www.avvo.com/administrative-law-lawyer/ny.html
Run Code Online (Sandbox Code Playgroud)