如何使用 R 中的 WikipediR 包从维基百科页面获取数据？

Question

如何使用 R 中的 WikipediR 包从维基百科页面获取数据？

Ron*_*hah 1 api mediawiki r

我需要从多个维基百科页面中获取特定部分的数据。我如何使用 WikipediR 包来做到这一点？或者还有其他更好的选择。准确地说，我只需要所有页面中的以下标记部分。

Sachin Tendulkar 上的维基百科页面

我怎么能得到那个？任何帮助，将不胜感激。

Answer 1

ASH*_*ASH 5

你能不能更具体一点你想要什么？这是从网络导入数据的简单方法，特别是从维基百科导入数据。

library(rvest)    
scotusURL <- "https://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States"

## ********************
## Option 1: Grab the tables from the page and use the html_table function to extract the tables you're interested in.

temp <- scotusURL %>% 
  html %>%
  html_nodes("table")

html_table(temp[1]) ## Just the "legend" table
html_table(temp[2]) ## THE MAIN TABLE

Run Code Online (Sandbox Code Playgroud)

现在，如果您想从具有基本相同结构的多个页面导入数据，但可能只是更改了一些数字或其他内容，请尝试这种方法。

library(RCurl);library(XML)

pageNum <- seq(1:10)
url <- paste0("http://www.totaljobs.com/JobSearch/Results.aspx?Keywords=Leadership&LTxt=&Radius=10&RateType=0&JobType1=CompanyType=&PageNum=") 
urls <- paste0(url, pageNum) 

allPages <- lapply(urls, function(x) getURLContent(x)[[1]])
xmlDocs <- lapply(allPages, function(x) XML::htmlParse(x))

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，7 月前
查看次数：	1966 次
最近记录：	10 年，6 月前