在 python 请求中处理井号 (#)

Nic*_*ick 1 python python-requests

我正在使用请求来编译自定义 URL,并且一个参数包含一个井号。谁能解释如何在不编码井号的情况下传递参数?

这将返回正确的 CSV 文件

results_url = 'https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7C&hfC=&hfSea=2019%7C&hfSit=&player_type=batter&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=&game_date_lt=&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&hfFlag=&hfPull=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=#results'
results = requests.get(results_url, timeout=30).content
results_df = pd.read_csv(io.StringIO(results.decode('utf-8')))
Run Code Online (Sandbox Code Playgroud)

这不

URL = 'https://baseballsavant.mlb.com/statcast_search/csv?'

def _get_statcast(params):

     _get = get(URL, params=params, timeout=30)
     _get.raise_for_status()
     return _get.content
Run Code Online (Sandbox Code Playgroud)

问题似乎是,当通过请求传递 '#results' 时,'#' 被忽略后会导致下载错误的 CSV。如果有人对解决此问题的其他方式有任何想法,我将不胜感激。

EDIT2:也在python论坛上问过这个https://python-forum.io/Thread-Handling-pound-sign-within-custom-URL?pid=75946#pid75946

Clo*_*ion 5

基本上,URL 中文字井号之后的任何内容都不会发送到服务器。这适用于浏览器和requests.

URL 的格式表明该type=#results部分实际上是一个查询参数。

requests将自动编码查询参数,而浏览器不会。以下是各种查询以及服务器在每种情况下收到的内容:


浏览器中的 URL 参数

在浏览器中使用井号时,井号之后的任何内容都不会发送到服务器:

https://httpbin.org/anything/type=#results
Run Code Online (Sandbox Code Playgroud)

返回:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-GB,en;q=0.9,en-US;q=0.8,de;q=0.7", 
    "Cache-Control": "max-age=0", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "*redacted*"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "*redacted*", 
  "url": "https://httpbin.org/anything/type="
}
Run Code Online (Sandbox Code Playgroud)
  • 服务器收到的 URL 是https://httpbin.org/anything/type=.
  • 被请求的页面被调用type=,这似乎不正确。

浏览器中的查询参数

<key>=<value>格式表明它可能是你所传递的查询参数。尽管如此,英镑符号之后的任何内容都不会发送到服务器:

https://httpbin.org/anything?type=#results
Run Code Online (Sandbox Code Playgroud)

返回:

{
  "args": {
    "type": ""
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-GB,en;q=0.9,en-US;q=0.8,de;q=0.7", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "*redacted*"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "*redacted*", 
  "url": "https://httpbin.org/anything?type="
}
Run Code Online (Sandbox Code Playgroud)
  • 服务器收到的 URL 是https://httpbin.org/anything?type=.
  • 被请求的页面称为anything
  • type接收到一个没有值的参数。

浏览器中的编码查询参数

https://httpbin.org/anything?type=%23results
Run Code Online (Sandbox Code Playgroud)

返回:

{
  "args": {
    "type": "#results"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-GB,en;q=0.9,en-US;q=0.8,de;q=0.7", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "*redacted*"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "*redacted*", 
  "url": "https://httpbin.org/anything?type=%23results"
}
Run Code Online (Sandbox Code Playgroud)
  • 服务器收到的 URL 是https://httpbin.org/anything?type=%23results.
  • 被请求的页面称为anything
  • 接收到type值为 的参数#results

带有 URL 参数的 Python 请求

requests 英镑符号后也不会向服务器发送任何内容:

import requests

r = requests.get('https://httpbin.org/anything/type=#results')
print(r.url)
print(r.json())
Run Code Online (Sandbox Code Playgroud)

返回:

https://httpbin.org/anything/type=#results
{
    "args": {},
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.21.0"
    },
    "json": null,
    "method": "GET",
    "origin": "*redacted*",
    "url": "https://httpbin.org/anything/type="
}
Run Code Online (Sandbox Code Playgroud)
  • 服务器收到的 URL 是https://httpbin.org/anything?type=.
  • 被请求的页面称为anything
  • type接收到一个没有值的参数。

带有查询参数的 Python 请求

requests 自动编码查询参数:

import requests

r = requests.get('https://httpbin.org/anything', params={'type': '#results'})
print(r.url)
print(r.json())
Run Code Online (Sandbox Code Playgroud)

返回:

https://httpbin.org/anything?type=%23results
{
    "args": {
        "type": "#results"
    },
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.21.0"
    },
    "json": null,
    "method": "GET",
    "origin": "*redacted*",
    "url": "https://httpbin.org/anything?type=%23results"
}
Run Code Online (Sandbox Code Playgroud)
  • 服务器收到的 URL 是https://httpbin.org/anything?type=%23results.
  • 被请求的页面称为anything
  • 接收到type值为 的参数#results

带有双重编码查询参数的 Python 请求

如果您手动编码查询参数,然后将其传递给requests,它将再次对已编码的查询参数进行编码:

import requests

r = requests.get('https://httpbin.org/anything', params={'type': '%23results'})
print(r.url)
print(r.json())
Run Code Online (Sandbox Code Playgroud)

返回:

https://httpbin.org/anything?type=%23results
{
    "args": {
        "type": "%23results"
    },
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.21.0"
    },
    "json": null,
    "method": "GET",
    "origin": "*redacted*",
    "url": "https://httpbin.org/anything?type=%2523results"
}
Run Code Online (Sandbox Code Playgroud)
  • 服务器收到的 URL 是https://httpbin.org/anything?type=%2523results.
  • 被请求的页面称为anything
  • 接收到type值为 的参数%23results