Python请求和持久会话

Chr*_*est 95 python python-requests

我正在使用请求模块(版本0.10.0与Python 2.5).我已经想出如何将数据提交到网站上的登录表单并检索会话密钥,但我看不到在后续请求中使用此会话密钥的明显方法.有人可以在下面的代码中填写省略号或建议另一种方法吗?

>>> import requests
>>> login_data =  {'formPosted':'1', 'login_email':'me@example.com', 'password':'pw'}
>>> r = requests.post('https://localhost/login.py', login_data)
>>> 
>>> r.text
u'You are being redirected <a href="profilePage?_ck=1349394964">here</a>'
>>> r.cookies
{'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'}
>>> 
>>> r2 = requests.get('https://localhost/profile_data.json', ...)
Run Code Online (Sandbox Code Playgroud)

Anu*_*pta 177

您可以使用以下方法轻松创建持久会话:

s = requests.Session()
Run Code Online (Sandbox Code Playgroud)

之后,继续处理您的请求:

s.post('https://localhost/login.py', login_data)
#logged in! cookies saved for future requests.
r2 = s.get('https://localhost/profile_data.json', ...)
#cookies sent automatically!
#do whatever, s will keep your cookies intact :)
Run Code Online (Sandbox Code Playgroud)

有关会话的更多信息:http://docs.python-requests.org/en/latest/user/advanced/#session-objects

  • 可以将pickle.dump会话cookie添加到像pickle.dump(session.cookies._cookies,file)和pickle.load这样的文件,如下所示cookies = pickle.load(file)cj = requests.cookies.RequestsCookieJar()cj._cookies = cookies和session.cookies = cj (7认同)
  • 在脚本运行之间保存Session本身的任何方法? (3认同)
  • @SergeyNudnov 非常感谢您的评论,我浪费了很多时间试图找出会话不能正确处理 cookie 的原因。将域从 localhost 更改为 localhost.local 解决了问题。再次感谢。 (3认同)
  • 对于发送到“localhost”的请求,如果 Web 服务器返回的登录和其他 cookie 包含不正确的域属性值,则可能会出现问题。对于“localhost”,Web 服务器应返回域属性设置为“localhost.local”的 cookie,否则 cookie 将不会应用于会话。在这种情况下,请使用“127.0.0.1”而不是“localhost” (2认同)

Mor*_*sen 15

在这个类似的问题中查看我的答案:

python:urllib2如何使用urlopen请求发送cookie

import urllib2
import urllib
from cookielib import CookieJar

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# input-type values from the html form
formdata = { "username" : username, "password": password, "form-id" : "1234" }
data_encoded = urllib.urlencode(formdata)
response = opener.open("https://page.com/login.php", data_encoded)
content = response.read()
Run Code Online (Sandbox Code Playgroud)

编辑:

我看到我的答案得到了一些支持,但没有解释评论.我猜它是因为我指的是urllib库而不是库requests.我之所以这样做,是因为OP要求帮助,requests或者有人建议采用另一种方法.

  • 作为OP,我可以说你的答案提供了一个有用的选择.如果只是为了证明`requests`为一个问题提供了一个简单而高级的解决方案,否则需要3个库来实现. (7认同)
  • 我不是你的沮丧选民之一,但作为猜测,很多读者可能会将OP的最后一句称为"有人可以在下面的代码中填写省略号或建议另一种方法[请求库将涉及更多主要对我的代码进行手术,而不仅仅是用其他东西填充省略号." - 但这只是我的猜测. (2认同)

Dom*_*Cat 15

其他答案有助于理解如何维护这样的会话.另外,我想提供一个类,它通过不同的脚本运行(使用缓存文件)保持会话.这意味着只有在需要时才会执行正确的"登录"(缓存中没有会话或没有会话).此外,它还支持后续调用"get"或"post"的代理设置.

它使用Python3进行测试.

使用它作为您自己的代码的基础.以下代码段是使用GPL v3发布的

import pickle
import datetime
import os
from urllib.parse import urlparse
import requests    

class MyLoginSession:
    """
    a class which handles and saves login sessions. It also keeps track of proxy settings.
    It does also maintine a cache-file for restoring session data from earlier
    script executions.
    """
    def __init__(self,
                 loginUrl,
                 loginData,
                 loginTestUrl,
                 loginTestString,
                 sessionFileAppendix = '_session.dat',
                 maxSessionTimeSeconds = 30 * 60,
                 proxies = None,
                 userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1',
                 debug = True,
                 forceLogin = False,
                 **kwargs):
        """
        save some information needed to login the session

        you'll have to provide 'loginTestString' which will be looked for in the
        responses html to make sure, you've properly been logged in

        'proxies' is of format { 'https' : 'https://user:pass@server:port', 'http' : ...
        'loginData' will be sent as post data (dictionary of id : value).
        'maxSessionTimeSeconds' will be used to determine when to re-login.
        """
        urlData = urlparse(loginUrl)

        self.proxies = proxies
        self.loginData = loginData
        self.loginUrl = loginUrl
        self.loginTestUrl = loginTestUrl
        self.maxSessionTime = maxSessionTimeSeconds
        self.sessionFile = urlData.netloc + sessionFileAppendix
        self.userAgent = userAgent
        self.loginTestString = loginTestString
        self.debug = debug

        self.login(forceLogin, **kwargs)

    def modification_date(self, filename):
        """
        return last file modification date as datetime object
        """
        t = os.path.getmtime(filename)
        return datetime.datetime.fromtimestamp(t)

    def login(self, forceLogin = False, **kwargs):
        """
        login to a session. Try to read last saved session from cache file. If this fails
        do proper login. If the last cache access was too old, also perform a proper login.
        Always updates session cache file.
        """
        wasReadFromCache = False
        if self.debug:
            print('loading or generating session...')
        if os.path.exists(self.sessionFile) and not forceLogin:
            time = self.modification_date(self.sessionFile)         

            # only load if file less than 30 minutes old
            lastModification = (datetime.datetime.now() - time).seconds
            if lastModification < self.maxSessionTime:
                with open(self.sessionFile, "rb") as f:
                    self.session = pickle.load(f)
                    wasReadFromCache = True
                    if self.debug:
                        print("loaded session from cache (last access %ds ago) "
                              % lastModification)
        if not wasReadFromCache:
            self.session = requests.Session()
            self.session.headers.update({'user-agent' : self.userAgent})
            res = self.session.post(self.loginUrl, data = self.loginData, 
                                    proxies = self.proxies, **kwargs)

            if self.debug:
                print('created new session with login' )
            self.saveSessionToCache()

        # test login
        res = self.session.get(self.loginTestUrl)
        if res.text.lower().find(self.loginTestString.lower()) < 0:
            raise Exception("could not log into provided site '%s'"
                            " (did not find successful login string)"
                            % self.loginUrl)

    def saveSessionToCache(self):
        """
        save session to a cache file
        """
        # always save (to update timeout)
        with open(self.sessionFile, "wb") as f:
            pickle.dump(self.session, f)
            if self.debug:
                print('updated session cache-file %s' % self.sessionFile)

    def retrieveContent(self, url, method = "get", postData = None, **kwargs):
        """
        return the content of the url with respect to the session.

        If 'method' is not 'get', the url will be called with 'postData'
        as a post request.
        """
        if method == 'get':
            res = self.session.get(url , proxies = self.proxies, **kwargs)
        else:
            res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs)

        # the session has been updated on the server, so also update in cache
        self.saveSessionToCache()            

        return res
Run Code Online (Sandbox Code Playgroud)

使用上述类的代码段可能如下所示:

if __name__ == "__main__":
    # proxies = {'https' : 'https://user:pass@server:port',
    #           'http' : 'http://user:pass@server:port'}

    loginData = {'user' : 'usr',
                 'password' :  'pwd'}

    loginUrl = 'https://...'
    loginTestUrl = 'https://...'
    successStr = 'Hello Tom'
    s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr, 
                       #proxies = proxies
                       )

    res = s.retrieveContent('https://....')
    print(res.text)

    # if, for instance, login via JSON values required try this:
    s = MyLoginSession(loginUrl, None, loginTestUrl, successStr, 
                       #proxies = proxies,
                       json = loginData)
Run Code Online (Sandbox Code Playgroud)

  • 这是一个很好的答案,搜索这个解决方案也很奇怪. (4认同)

Jim*_*kov 7

在尝试了上述所有答案后,我发现对后续请求使用“RequestsCookieJar”而不是常规 CookieJar 解决了我的问题。

import requests
import json

# The Login URL
authUrl = 'https://whatever.com/login'

# The subsequent URL
testUrl = 'https://whatever.com/someEndpoint'

# Logout URL
testlogoutUrl = 'https://whatever.com/logout'

# Whatever you are posting
login_data =  {'formPosted':'1', 
               'login_email':'me@example.com', 
               'password':'pw'
               }

# The Authentication token or any other data that we will receive from the Authentication Request. 
token = ''

# Post the login Request
loginRequest = requests.post(authUrl, login_data)
print("{}".format(loginRequest.text))

# Save the request content to your variable. In this case I needed a field called token. 
token = str(json.loads(loginRequest.content)['token'])  # or ['access_token']
print("{}".format(token))

# Verify Successful login
print("{}".format(loginRequest.status_code))

# Create your Requests Cookie Jar for your subsequent requests and add the cookie
jar = requests.cookies.RequestsCookieJar()
jar.set('LWSSO_COOKIE_KEY', token)

# Execute your next request(s) with the Request Cookie Jar set
r = requests.get(testUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))
print("R.STCD: {}".format(r.status_code))

# Execute your logout request(s) with the Request Cookie Jar set
r = requests.delete(testlogoutUrl, cookies=jar)
print("R.TEXT: {}".format(r.text))  # should show "Request Not Authorized"
print("R.STCD: {}".format(r.status_code))  # should show 401
Run Code Online (Sandbox Code Playgroud)


dm0*_*514 6

该文档说,get它带有一个可选cookies参数,允许您指定要使用的cookie:

从文档:

>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
Run Code Online (Sandbox Code Playgroud)

http://docs.python-requests.org/zh_CN/latest/user/quickstart/#cookies