R - 如何从 XML Nodeset 中提取项目?

IRN*_*art 3 html xml r nodes web-scraping

我有一个包含 438 个投手名称的列表,如下所示(在 XML 节点集中):

\n\n
> pitcherlinks[[1]]\n<td class="left " data-append-csv="abadfe01" data-stat="player" csk="Abad,Fernando0.01">\n  <a href="/players/a/abadfe01.shtml">Fernando\xc3\x82 Abad</a>*\n</td> \n\n> pitcherlinks[[2]]\n<td class="left " data-append-csv="adlemti01" data-stat="player" csk="Adleman,Tim0.01">\n  <a href="/players/a/adlemti01.shtml">Tim\xc3\x82 Adleman</a>\n</td> \n
Run Code Online (Sandbox Code Playgroud)\n\n

如何提取名称Fernando\xc3\x82 Abad以及相关链接/players/a/abadfe01.shtml

\n

And*_*ers 6

由于您有一个列表,因此使用 apply 函数来遍历该列表。每个函数都使用read_htmlCSS 选择器解析列表中的 hmtl 片段a以查找锚点(链接)。名称来自html_text,链接位于属性中href

\n\n
library(rvest)\npitcherlinks <- list()\npitcherlinks[[1]] <- \n\'<td class="left " data-append-csv="abadfe01" data-stat="player" csk="Abad,Fernando0.01">\n  <a href="/players/a/abadfe01.shtml">Fernando\xc3\x82 Abad</a>*\n    </td>\'\n\npitcherlinks[[2]] <- \n  \'<td class="left " data-append-csv="adlemti01" data-stat="player" csk="Adleman,Tim0.01">\n    <a href="/players/a/adlemti01.shtml">Tim\xc3\x82 Adleman</a>\n      </td>\'\n\nnames <- sapply(pitcherlinks, function(x) {x %>% read_html() %>% html_nodes("a") %>% html_text()})\nlinks <- sapply(pitcherlinks, function(x) {x %>% read_html() %>% html_nodes("a") %>% html_attr("href")})\n\nnames\n# [1] "Fernando\xc3\x82 Abad" "Tim\xc3\x82 Adleman"  \nlinks\n# [1] "/players/a/abadfe01.shtml"  "/players/a/adlemti01.shtml"\n
Run Code Online (Sandbox Code Playgroud)\n