如何使用Python从Github API中的所有页面获取数据?

ski*_*l95 6 python api pagination github python-requests

我正在尝试导出一个回购列表,它总是返回有关第一页的信息.我可以使用URL +"?per_page = 100"扩展每页的项目数,但这还不足以获得整个列表.我需要知道如何从列表1,2,...,N中获取列表提取数据.我正在使用Requests模块,如下所示:

while i <= 2:
      r = requests.get('https://api.github.com/orgs/xxxxxxx/repos?page{0}&per_page=100'.format(i), auth=('My_user', 'My_passwd'))
      repo = r.json()
      j = 0
      while j < len(repo):
            print repo[j][u'full_name']
            j = j+1
      i = i + 1
Run Code Online (Sandbox Code Playgroud)

我使用它而条件'因为我知道有2页,我试图以那种方式增加它但它不起作用

got*_*tit 11

url = "https://api.github.com/XXXX?simple=yes&per_page=100&page=1"
res=requests.get(url,headers={"Authorization": git_token})
repos=res.json()
while 'next' in res.links.keys():
  res=requests.get(res.links['next']['url'],headers={"Authorization": git_token})
  repos.extend(res.json())
Run Code Online (Sandbox Code Playgroud)


cdv*_*788 5

From github docs:

Response:

Status: 200 OK
Link: <https://api.github.com/resource?page=2>; rel="next",
      <https://api.github.com/resource?page=5>; rel="last"
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
Run Code Online (Sandbox Code Playgroud)

You get the links to the next and the last page of that organization. Just check the headers.

On Python Requests, you can access your headers with:

response.headers
Run Code Online (Sandbox Code Playgroud)

It is a dictionary containing the response headers. If link is present, then there are more pages and it will contain related information. It is recommended to traverse using those links instead of building your own.

You can try something like this:

import requests
url = 'https://api.github.com/orgs/xxxxxxx/repos?page{0}&per_page=100'
response = requests.get(url)
link = response.headers.get('link', None)
if link is not None:
    print link
Run Code Online (Sandbox Code Playgroud)

If link is not None it will be a string containing the relevant links for your resource.