Google Scholar 是否有可供我们在研究应用程序中使用的 API?

Asm*_*iaz 33 journal google-scholar researchkit

我正在开展一个研究出版物和合作项目,其中有文献检索功能。Google Scholar 似乎可以工作,因为它是一个开源工具,但是当我研究 Google Scholar 时,我找不到任何有关它具有 API 的信息。

有谷歌学术的API吗?

Dmi*_*Zub 25

没有官方的 Google Scholar API

有第三方解决方案,例如免费的scholarlyPython 包,它支持profileauthorcite有机结果(search_pubs似乎是获得有机结果的方法,尽管方法名称让我感到困惑)。

请注意,如果scholarly不断使用而没有请求速率限制,Google 可能会阻止您的 IP(由 @RadioControlled 提到)。明智地使用它。

此外,还有一个scrape-google-scholar-py模块可以让您提取几乎所有的 Google Scholar 页面。

或者,SerpApi 有一个Google Scholar API,它是一个付费 API,具有免费计划,支持有机引用个人资料作者结果,并绕过 SerpApi 后端上的所有阻止,因此它不会阻止您的 IP,并处理法律部分的刮擦。


scholarly使用using方法解析配置文件结果的示例代码search_by_keyword

import json
from scholarly import scholarly

# will paginate to the next page by default
authors = scholarly.search_keyword("biology")

for author in authors:
    print(json.dumps(author, indent=2))

# part of the output:

'''
{
  "container_type": "Author",
  "filled": [],
  "source": "SEARCH_AUTHOR_SNIPPETS",
  "scholar_id": "LXVfPc8AAAAJ",
  "url_picture": "https://scholar.google.com/citations?view_op=medium_photo&user=LXVfPc8AAAAJ",
  "name": "Eric Lander",
  "affiliation": "Broad Institute",
  "email_domain": "",
  "interests": [
    "Biology",
    "Genomics",
    "Genetics",
    "Bioinformatics",
    "Mathematics"
  ],
  "citedby": 552013
}
... other author results
'''
Run Code Online (Sandbox Code Playgroud)

使用示例scrape-google-scholar-py

from google_scholar_py import CustomGoogleScholarProfiles
import json

parser = CustomGoogleScholarProfiles()
data = parser.scrape_google_scholar_profiles(
    query='blizzard',
    pagination=False,
    save_to_csv=False,
    save_to_json=False
)
print(json.dumps(data, indent=2))
Run Code Online (Sandbox Code Playgroud)

输出:

[
  {
    "name": "Adam Lobel",
    "link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Gaming",
      "Emotion regulation"
    ],
    "email": "Verified email at AdamLobel.com",
    "cited_by_count": 3593
  }, # other results...
]
Run Code Online (Sandbox Code Playgroud)

使用SerpApi 中的Google Scholar Profile Results API解析有机结果的示例代码:

import json
from serpapi import GoogleScholarSearch

# search parameters
params = {
    "api_key": "Your SerpApi API key",
    "engine": "google_scholar_profiles",
    "hl": "en",                            # language
    "mauthors": "biology"                  # search query
}

search = GoogleScholarSearch(params)
results = search.get_dict()

# only first page results
for result in results["profiles"]:
    print(json.dumps(result, indent=2))

# part of the output:
'''
{
  "name": "Masatoshi Nei",
  "link": "https://scholar.google.com/citations?hl=en&user=VxOmZDgAAAAJ",
  "serpapi_link": "https://serpapi.com/search.json?author_id=VxOmZDgAAAAJ&engine=google_scholar_author&hl=en",
  "author_id": "VxOmZDgAAAAJ",
  "affiliations": "Laura Carnell Professor of Biology, Temple University",
  "email": "Verified email at temple.edu",
  "cited_by": 384074,
  "interests": [
    {
      "title": "Evolution",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolution"
    },
    {
      "title": "Evolutionary biology",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolutionary_biology"
    },
    {
      "title": "Molecular evolution",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:molecular_evolution"
    },
    {
      "title": "Population genetics",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:population_genetics"
    },
    {
      "title": "Phylogenetics",
      "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics",
      "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:phylogenetics"
    }
  ],
  "thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=VxOmZDgAAAAJ&citpid=3"
}
... other results
'''
Run Code Online (Sandbox Code Playgroud)

我在 SerpApi 上有一篇专门使用 Python 抓取历史 Google Scholar 结果的博客文章,其中展示了如何将 2017-2021 年有机、引用 Google Scholar 的历史结果抓取到 CSV、SQLite。

如果您不是 Python 爱好者,还有一篇关于在 R 中抓取 Google Scholar 的博客文章。

免责声明,我为 SeprApi 工作

  • 请注意,如果您过多地使用学术内容,您的整个组织可能会很容易被 Google 搜索屏蔽一段时间(或者每个人都必须输入验证码进行搜索)。不建议。 (2认同)

Chr*_*ger 10

快速搜索发现其他人正在尝试实现此类 API,但 Google 没有提供。目前尚不清楚这是否合法,请参阅例如 如何获得谷歌的许可以使用谷歌学术数据(如果需要)?