我正在尝试使用 rvest 从网站上抓取 HTML 表格。唯一的问题是我试图抓取的表格没有<tr>标签,除了第一行。它看起来像这样:
<tr>
<td>6/21/2015 9:38 PM</td>
<td>5311 Lake Park</td>
<td>UCPD</td>
<td>African American</td>
<td>Male</td>
<td>Subject was causing a disturbance in the area.</td>
<td>Name checked; no further action</td>
<td>No</td>
</tr>
<td>6/21/2015 10:37 PM</td>
<td>5200 S Blackstone</td>
<td>UCPD</td>
<td>African American</td>
<td>Male</td>
<td>Subject was observed fighting in the McDonald's parking lot</td>
<td>Warned; released</td>
<td>No</td>
</tr>
Run Code Online (Sandbox Code Playgroud)
等等。因此,使用以下代码,我只能将第一行放入我的数据框中:
library(rvest)
mydata <- html_session("https://incidentreports.uchicago.edu/incidentReportArchive.php?startDate=06/01/2015&endDate=06/21/2015") %>%
html_node("table") %>%
html_table(header = TRUE, fill=TRUE)
Run Code Online (Sandbox Code Playgroud)
我如何改变它以使 html_table 理解行是行,即使它们没有开始标记<tr>?或者有更好的方法来解决这个问题吗?