rvest html()无法识别URL

Question

rvest html()无法识别URL

因此,我正在编写一个带有R的网络刮刀来搜索zillow,了解西澳州每个县的房屋中值.我正在使用rvest包,这里是有问题的代码:

URL <- "https://en.wikipedia.org/wiki/List_of_counties_in_Washington"
wiki <- html(URL)

#Getting the list of counties in WA
counties <- wiki %>%
  html_nodes(".wikitable td:nth-child(1) a") %>%
  html_text()

#Putting together a list to pull my search terms from
searchTerms <- list()

for(i in 1:length(counties)) {
  searchTerms[[i]] <- paste0(counties[i], ", WA", sep="")
}
searchTerms <- gsub(",", "", searchTerms)
searchTerms <- gsub(" ", "-", searchTerms)

homeValues <- list()

#Getting the HTML for each county using the search terms in the URL,
#eventually it will pull the homeValues data from that HTML.
for(j in 1:length(searchTerms)){
  zillowURL <- paste0("www.zillow.com/", searchTerms[j], "/home-values/", sep="")
  zillowHTML <- html(zillowURL)

}

Run Code Online (Sandbox Code Playgroud)

当然,我还没有完成,但是当我运行此代码时,我收到错误消息

"错误:文件www.zillow.com/Adams-County-WA/home-values/不存在"

华盛顿州亚当斯县是华盛顿州第一个按字母顺序排列的县.我的猜测是,这与Zillow的网站如何运作有关？当我在浏览器中访问上述URL时,它可以正常工作.

Answer 1

RHe*_*tel 5

尝试修改代码中的一行:

zillowURL <- paste0("http://www.zillow.com/", searchTerms[j], "/home-values/", sep="")

Run Code Online (Sandbox Code Playgroud)

然后不应出现错误消息.整个URL是必需的,包括开头的"http://",Web浏览器认为这是理所当然的.

归档时间：	10 年，6 月前
查看次数：	550 次
最近记录：	10 年，4 月前