htmlParse无法加载外部实体

Tum*_*own 7 xml r

我正在尝试使用R和XML包加载一些公开可用的NHS数据,但我不断收到以下错误消息:

错误:无法加载外部实体" http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/ "

尽管看了一些相关的问题,我似乎无法弄清楚可能导致这种情况的原因.

这是我非常简单的代码:

library("XML")
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(url)
Run Code Online (Sandbox Code Playgroud)

编辑:会话信息

R版本3.0.1(2013-05-16)平台:i386-w64-mingw32/i386(32位)

locale:[1] LC_COLLATE = English_United Kingdom.1252 LC_CTYPE = English_United Kingdom.1252 [3] LC_MONETARY = English_United Kingdom.1252 LC_NUMERIC = C [5] LC_TIME = English_United Kingdom.1252

附加基础包:[1] stats graphics grDevices utils
数据集方法库

通过命名空间加载(而不是附加):[1] tools_3.0.1

use*_*933 10

包XML有一些问题.问题是间歇性的,与URL无关.我使用httr包的函数GET解决了问题,以获取html代码,然后将其传递给htmlParse,见下文:

library("XML")
library(httr)
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(rawToChar(GET(url)$content))
Run Code Online (Sandbox Code Playgroud)


hrb*_*str 5

您还可以使用rvest&xml2包:

library(rvest) # github version
library(xml2)  # github version

url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- read_html(url)

doc %>% 
  html_nodes("a[href^='http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/']") %>% 
  html_attr("href")

## [1] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-overnight/"
## [2] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-day-only/" 
Run Code Online (Sandbox Code Playgroud)