Asm*_*iaz 33 journal google-scholar researchkit
我正在开展一个研究出版物和合作项目,其中有文献检索功能。Google Scholar 似乎可以工作,因为它是一个开源工具,但是当我研究 Google Scholar 时,我找不到任何有关它具有 API 的信息。
有谷歌学术的API吗?
Dmi*_*Zub 25
有第三方解决方案,例如免费的scholarlyPython 包,它支持profile、author、cite和有机结果(search_pubs似乎是获得有机结果的方法,尽管方法名称让我感到困惑)。
请注意,如果scholarly不断使用而没有请求速率限制,Google 可能会阻止您的 IP(由 @RadioControlled 提到)。明智地使用它。
此外,还有一个scrape-google-scholar-py模块可以让您提取几乎所有的 Google Scholar 页面。
或者,SerpApi 有一个Google Scholar API,它是一个付费 API,具有免费计划,支持有机、引用、个人资料、作者结果,并绕过 SerpApi 后端上的所有阻止,因此它不会阻止您的 IP,并处理法律部分的刮擦。
scholarly使用using方法解析配置文件结果的示例代码search_by_keyword:
import json
from scholarly import scholarly
# will paginate to the next page by default
authors = scholarly.search_keyword("biology")
for author in authors:
print(json.dumps(author, indent=2))
# part of the output:
'''
{
"container_type": "Author",
"filled": [],
"source": "SEARCH_AUTHOR_SNIPPETS",
"scholar_id": "LXVfPc8AAAAJ",
"url_picture": "https://scholar.google.com/citations?view_op=medium_photo&user=LXVfPc8AAAAJ",
"name": "Eric Lander",
"affiliation": "Broad Institute",
"email_domain": "",
"interests": [
"Biology",
"Genomics",
"Genetics",
"Bioinformatics",
"Mathematics"
],
"citedby": 552013
}
... other author results
'''
Run Code Online (Sandbox Code Playgroud)
from google_scholar_py import CustomGoogleScholarProfiles
import json
parser = CustomGoogleScholarProfiles()
data = parser.scrape_google_scholar_profiles(
query='blizzard',
pagination=False,
save_to_csv=False,
save_to_json=False
)
print(json.dumps(data, indent=2))
Run Code Online (Sandbox Code Playgroud)
输出:
[
{
"name": "Adam Lobel",
"link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
"affiliations": "Blizzard Entertainment",
"interests": [
"Gaming",
"Emotion regulation"
],
"email": "Verified email at AdamLobel.com",
"cited_by_count": 3593
}, # other results...
]
Run Code Online (Sandbox Code Playgroud)
使用SerpApi 中的Google Scholar Profile Results API解析有机结果的示例代码:
import json
from serpapi import GoogleScholarSearch
# search parameters
params = {
"api_key": "Your SerpApi API key",
"engine": "google_scholar_profiles",
"hl": "en", # language
"mauthors": "biology" # search query
}
search = GoogleScholarSearch(params)
results = search.get_dict()
# only first page results
for result in results["profiles"]:
print(json.dumps(result, indent=2))
# part of the output:
'''
{
"name": "Masatoshi Nei",
"link": "https://scholar.google.com/citations?hl=en&user=VxOmZDgAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=VxOmZDgAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "VxOmZDgAAAAJ",
"affiliations": "Laura Carnell Professor of Biology, Temple University",
"email": "Verified email at temple.edu",
"cited_by": 384074,
"interests": [
{
"title": "Evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolution"
},
{
"title": "Evolutionary biology",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolutionary_biology"
},
{
"title": "Molecular evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:molecular_evolution"
},
{
"title": "Population genetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:population_genetics"
},
{
"title": "Phylogenetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:phylogenetics"
}
],
"thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=VxOmZDgAAAAJ&citpid=3"
}
... other results
'''
Run Code Online (Sandbox Code Playgroud)
我在 SerpApi 上有一篇专门使用 Python 抓取历史 Google Scholar 结果的博客文章,其中展示了如何将 2017-2021 年有机、引用 Google Scholar 的历史结果抓取到 CSV、SQLite。
如果您不是 Python 爱好者,还有一篇关于在 R 中抓取 Google Scholar 的博客文章。
免责声明,我为 SeprApi 工作
| 归档时间: |
|
| 查看次数: |
29876 次 |
| 最近记录: |