Phi*_*tin 5 html r web-scraping rvest
我正在尝试使用rvest包从CABI入侵物种纲要中提取入侵植物物种位置的数据.
看了几个教程,我发现我应该能够很容易地从表中抓取数据.但是,我一直遇到困难.
假设我想要品种Brassica tournefortii的位置数据.我应该能够使用此代码,该代码使用此处概述的技术来获取物种记录位置的详细信息.
library(rvest)
isc<-read_html("http://www.cabi.org/isc/datasheet/50069")
isc %>%
html_node("#toDistributionTable td:nth-child(1)") %>%
html_text()
Run Code Online (Sandbox Code Playgroud)
但是,运行此代码我收到错误
Error: No matches
Run Code Online (Sandbox Code Playgroud)
我是webscraping的新手.我做错了什么吗?
首先,我希望我能更多地投资你.最后一个刮刮问题不是$ SPORTSBALL或$ MONEY相关!:-)
那个网站很邪恶.它使用需要处理的嵌入式命名空间,这也意味着使用xml2包:
library(rvest)
library(xml2)
isc <- read_html("http://www.cabi.org/isc/datasheet/50069")
ns <- xml_ns(isc)
xml_text(xml_find_all(isc, xpath="//div[@id='toDistributionTable']/table/tbody/tr/td[1]", ns))
## [1] "ASIA" "Azerbaijan"
## [3] "Bhutan" "China"
## [5] "-Tibet" "India"
## [7] "-Delhi" "-Indian Punjab"
## [9] "-Rajasthan" "-Uttar Pradesh"
## [11] "Iran" "Iraq"
## [13] "Israel" "Jordan"
## [15] "Kuwait" "Lebanon"
## [17] "Oman" "Pakistan"
## [19] "Qatar" "Saudi Arabia"
## [21] "Syria" "Turkey"
## [23] "Turkmenistan" "United Arab Emirates"
## [25] "Uzbekistan" "Yemen"
## [27] "AFRICA" "Algeria"
## [29] "Egypt" "Libya"
## [31] "Morocco" "South Africa"
## [33] "Tunisia" "NORTH AMERICA"
## [35] "Mexico" "USA"
## [37] "-Arizona" "-California"
## [39] "-Nevada" "-New Mexico"
## [41] "-Texas" "-Utah"
## [43] "SOUTH AMERICA" "Chile"
## [45] "EUROPE" "Belgium"
## [47] "Cyprus" "Denmark"
## [49] "France" "Greece"
## [51] "Ireland" "Italy"
## [53] "Spain" "Sweden"
## [55] "UK" "-England and Wales"
## [57] "-Scotland" "OCEANIA"
## [59] "Australia" "-Australian Northern Territory"
## [61] "-New South Wales" "-Queensland"
## [63] "-South Australia" "-Tasmania"
## [65] "-Victoria" "-Western Australia"
## [67] "New Zealand"
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
209 次 |
| 最近记录: |