qbq*_*qbq 3 python cookies httpresponse http-headers python-requests
我使用 Python Rquests 来提取响应的完整标头。
我想准确计算响应中有多少个 cookie(即 nam/variable)对。有两个问题:
1) 如果服务器响应多个 Set-Cookie 标头。Requests 如何表示这一点?它是否将两个 Set-Cookie 值合并为一?还是保持原样?
这是我打印标题的脚本(完整标题):
import requests
requests.packages.urllib3.disable_warnings() # to disable certificate warnings
response = requests.get("https://example.com",verify=False,timeout=3)
print(str(response.headers))
response_headers = response.headers.get('Set-Cookie')
Run Code Online (Sandbox Code Playgroud)
但是当我查看一些Set-Cookie响应标头时,我发现一些名称/值对用逗号分隔,如下所示:
dnn_IsMobile=False; path=/; secure; HttpOnly, Analytics_VisitorId=aa; expires=Mon 19-Aug-2019 14:20:02 GMT; path=/; secure; HttpOnly, Analytics=SessionId=vv&ContentItemId=-1; expires=Sat 20-Jul-2019 15:20:02 GMT; path=/; secure
Run Code Online (Sandbox Code Playgroud)
2)这是否意味着服务器发送了多个Set-Cookie请求并将它们组合起来?
如果 requests 在 cookie 的名称/值对之间添加逗号,那么它是否总是用逗号后跟空格分隔它们?iecookie1=value, cookie2=value而不仅仅是像 . 这样的逗号cookie1=value,cookie2=value。
理解这种差异对我来说非常重要,这样我就能计算出收到的 cookie 的正确数量。
您可以使用更高级别.cookies来获取它们,而不是使用.headers.
例如:
>>> url="https://github.com"
>>> r = requests.get(url)
>>> r.cookies
<RequestsCookieJar[Cookie(version=0, name='_octo', value='GH1.1.1081626831.1563694143', port=None, port_specified=False, domain='.github.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=1626852543, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False), Cookie(version=0, name='logged_in', value='no', port=None, port_specified=False, domain='.github.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=True, expires=2194846143, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='_gh_sess', value='N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20', port=None, port_specified=False, domain='github.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=True, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='has_recent_activity', value='1', port=None, port_specified=False, domain='github.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=1563697743, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)]>
>>> len(r.cookies)
4
>>> r.cookies.keys()
['_octo', 'logged_in', '_gh_sess', 'has_recent_activity']
>>> for key in r.cookies.iterkeys(): print("{}: {}".format(key, r.cookies[key]))
...
_octo: GH1.1.1081626831.1563694143
logged_in: no
_gh_sess: N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20
has_recent_activity: 1
Run Code Online (Sandbox Code Playgroud)
PS 有时阅读源代码会更容易,我发现通过阅读cookies.py :)
编辑以下中的分隔符(无论是", "或",")r.headers.get("Set-Cookie"):
Requestsurllib3在幕后使用,你会发现它r.raw是 的一个对象urllib3.response.HTTPResponse。HTTPHeaderDict在 urllib3 中,标头由中 定义的表示_collections.py,并由其中连接多个值", "。
def __getitem__(self, key):
val = self._container[key.lower()]
return ", ".join(val[1:])
Run Code Online (Sandbox Code Playgroud)因此,您可以使用它", "来计算 cookie 的数量。
Set-Cookies为一个headers?恐怕答案是肯定的,因为通过检查它的值(为了更好的阅读,删除了一些不相关的标题):
>>> r.headers
{
'Date': 'Sun, 21 Jul 2019 07:29:03 GMT',
'Content-Type': 'text/html; charset=utf-8',
'Transfer-Encoding': 'chunked',
'Server': 'GitHub.com',
'Status': '200 OK',
'Set-Cookie': 'has_recent_activity=1; path=/; expires=Sun, 21 Jul 2019 08:29:03 -0000, _octo=GH1.1.1081626831.1563694143; domain=.github.com; path=/; expires=Wed, 21 Jul 2021 07:29:03 -0000, logged_in=no; domain=.github.com; path=/; expires=Thu, 21 Jul 2039 07:29:03 -0000; secure; HttpOnly, _gh_sess=N0NVdFd3dTMzcm9GSkh1U21ZQkVaYWUvWnBnRmVic0VFWm9kSVZKVVhMV0hVdUw4cDh5cGpmTmIrQ0xJYU9tNHE0ZHQxVkZlUU9JRGJHUkJtc21yVGM0Mk9hQjBUYnhDVXJYSFVWSjNzT2ZpNjdEVzF0emZydkJmQmgvZmVRRFhEaE1CRTlnd0ZPY0RRY0Z4L1ByaFFpbWhVTGtPZTZmUHhONzBxclIrWWZSdFlZK09NN1QzS1dlL3cwWmVSdG5wTHFROTh1Zmh6Y3JkMjFDQmtxb2FHQT09LS1DUEd6UHFtWS9ubTdpOEdwYndzU3l3PT0%3D--2f3ae9c74cba34f2e8de6dfe55c3616e8a35ab20; path=/; secure; HttpOnly',
'Content-Encoding': 'gzip',
'X-GitHub-Request-Id': 'A947:3711:E0377A:13B4CEA:5D34143E'
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4746 次 |
| 最近记录: |