cod*_*rer 0 r web-scraping scrape rvest
我正在尝试从“https://www.metabolomicsworkbench.org/data/mb_struct_ajax.php”中抓取表格。
我在网上找到的代码(rvest)不起作用
library(rvest)
url <- "https://www.metabolomicsworkbench.org/data/mb_structure_ajax.php"
A <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="containerx"]/div[1]/table') %>%
html_table()
Run Code Online (Sandbox Code Playgroud)
A 是“0 的列表”
我应该如何修复此代码或者有更好的方法吗?
提前致谢。
页面源码由JS生成。这是你要做的:

https://www.metabolomicsworkbench.org/data/mb_structure_tableonly.php并获取了它的内容。
以下是参数:

rvest。这是抓取所有页面的代码:library(rvest)
url <- "https://www.metabolomicsworkbench.org/data/mb_structure_tableonly.php"
pg <- html_session(url)
data <-
purrr::map_dfr(
1:4288, # you might wanna change it to a small number to try first or scrape multiple times and combine data frames later, in case something happens in the middle
function(i) {
pg <- rvest:::request_POST(pg,
url,
body = list(
page = i
))
read_html(pg) %>%
html_node("table") %>%
html_table()
}
)
Run Code Online (Sandbox Code Playgroud)