如何用rvest和xpath刮一张桌子？

Question

如何用rvest和xpath刮一张桌子？

使用以下文档我一直试图从marketwatch.com刮掉一系列表

这是代码所代表的代码:

链接和xpath已包含在代码中:

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation <- url %>%
  html() %>%
  html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%
  html_table()
valuation <- valuation[[1]]

Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

Warning message:
'html' is deprecated.
Use 'read_html' instead.
See help("Deprecated")

Run Code Online (Sandbox Code Playgroud)

提前致谢.

Answer 1

Sym*_*xAU 9

该网站不使用html表,因此html_table()无法找到任何内容.它通常使用div类column和data lastcolumn.

所以你可以做点什么

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation_col <- url %>%
    read_html() %>%
    html_nodes(xpath='//*[@class="column"]')

valuation_data <- url %>%
    read_html() %>%
    html_nodes(xpath='//*[@class="data lastcolumn"]')

Run Code Online (Sandbox Code Playgroud)

甚至

url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="section"]')

Run Code Online (Sandbox Code Playgroud)

为了让你大部分的方式.

另请阅读他们的使用条款 - 特别是3.4.

归档时间：	9 年，11 月前
查看次数：	10476 次
最近记录：	9 年，11 月前