Rvest错误:输入'externalptr'

hos*_*ley 5 r rvest

我试图用来rvest提取PGA高尔夫球手的出生日期.我们来试试Stuart Appleby吧.以下是他在ESPN网站上的个人资料http://espn.go.com/golf/player/_/id/11/stuart-appleby.注意他的爆头旁边的他的DOB.

library("rvest")
url <- "http://espn.go.com/golf/player/_/id/11/stuart-appleby"
li_node <- url %>% html %>% html_nodes("li")
Run Code Online (Sandbox Code Playgroud)

他的DOB包含在li_node的第22项中.理想情况下,我不会将[[22]]硬编码到我的程序中,但即使我这样做,也会遇到错误.

li_node[[22]]
Run Code Online (Sandbox Code Playgroud)

显示我想要的信息,但是像:

word(li_node[[22]], ...)
substr(li_node[[22]], ...)
pluck(li_node, 22)
Run Code Online (Sandbox Code Playgroud)

都返回错误:

> word(li_node[[22]], 1)
Error in rep(string, length = n) : 
  attempt to replicate an object of type 'externalptr'
> substr(li_node[[22]], 1, 2)
Error in as.vector(x, "character") : 
  cannot coerce type 'externalptr' to vector of type 'character'
> pluck(li_node, 22)
Error in FUN(X[[1L]], ...) : 
  object of type 'externalptr' is not subsettable
Run Code Online (Sandbox Code Playgroud)

有没有一种简单的方法可以让我使用DOB rvest

cor*_*ory 6

library("rvest")
library("stringr")
url <- "http://espn.go.com/golf/player/_/id/11/stuart-appleby"
url %>% 
  html %>% 
  html_nodes(xpath='//li[contains(.,"Age")]') %>% 
  html_text() %>% 
  str_extract("[A-Z][a-z]{2,} [0-9]{1,2}, [0-9]{4}")
Run Code Online (Sandbox Code Playgroud)

收益:

[1] "May 1, 1971"
Run Code Online (Sandbox Code Playgroud)