有什么方法可以获取 Json 中的维基百科页面数据吗?

ume*_*mer 3 api wikipedia mediawiki-api

我想获取https://en.wikipedia.org/wiki/Cat的完整数据。我尝试了使用 wiki api 的不同方法,但无法获取 Json 中的数据。我只能获取猫的第一个描述。有没有办法获取Json格式的完整页面内容?

0st*_*ne0 7

使用Api Sandbox,您可以创建一个请求,例如;

http://en.wikipedia.org//w/api.php?action=query&format=json&prop=revisions&titles=Cat&formatversion=2&rvprop=content&rvslots=*
Run Code Online (Sandbox Code Playgroud)

使用format=json检索 Json,使用rvprop=content+rvslots=*获取完整内容。

注意:内容仍为 MediaWiki 格式

结果(修剪后);

{
    "batchcomplete": true,
    "query": {
        "pages": [
            {
                "pageid": 6678,
                "ns": 0,
                "title": "Cat",
                "revisions": [
                    {
                        "slots": {
                            "main": {
                                "contentmodel": "wikitext",
                                "contentformat": "text/x-wiki",
                                "content": "{{Good article}}\n{{pp-semi-indef|small=yes}}{{pp-move-indef|small=yes}}\n{{short description|Domesticated feline}}\n{{about|the species that is commonly kept as a pet|the cat family|Felidae|other uses|Cat (disambiguation)|and|Cats (disambiguation)}}\n{{technical reasons|Cat #1|the album|Cat 1 (album)}}\n{{Use dmy dates|date=February 2019}}{{Use American English|date=January 2020}}<!-- Per MOS:ENGVAR and MOS:DATEVAR, articles should conform to one overall spelling of English and date format, typically the ones with which it was created when the topic has no strong national ties. This article was created with American English, using international date format (DD Month YYYY), and should continue to be written that way. If there is a compelling reason to change it, propose a change on the talk page. -->\n{{Speciesbox\n|name= Domestic cat\n|status= DOM\n<!-- There has been extensive discussion about the choice of image in this infobox. Before replacing this image with something else, consider if it actually improves on the ENCYCLOPEDIC CRITERIA which led to this choice.... +150000 chars..
                            }
                        }
                    }
                ]
            }
        ]
    }
}
Run Code Online (Sandbox Code Playgroud)

可选;添加prop=extracts以更改 MediaWiki 格式的输出以获得“已清理”的响应;

../api.php?action=query&format=json&prop=extracts&titles=Cat&formatversion=2&rvprop=content&rvslots=*
Run Code Online (Sandbox Code Playgroud)
"query": {
        "pages": [
            {
                "pageid": 6678,
                "ns": 0,
                "title": "Cat",
                "extract": "<p class=\"mw-empty-elt\">\n\n</p>\n\n\n\n<p class=\"mw-empty-elt\">\n\n</p>\n<p>The <b>cat</b> (<i>Felis catus</i>) is a small carnivorous mammal. It is the only domesticated species in the family Felidae and often referred to as the <b>domestic cat</b> to distinguish it from wild members of the family. The cat is either a <b>house cat</b>, a <b>farm cat</b> or a <b>feral cat</b>; latter ranges freely and avoids human contact.\nDomestic cats are valued by humans for companionship and for their ability to hunt rodents.  +483000 chars
            }
        ]
    }
Run Code Online (Sandbox Code Playgroud)