IRN*_*art 3 html xml r nodes web-scraping
我有一个包含 438 个投手名称的列表,如下所示(在 XML 节点集中):
\n\n> pitcherlinks[[1]]\n<td class="left " data-append-csv="abadfe01" data-stat="player" csk="Abad,Fernando0.01">\n <a href="/players/a/abadfe01.shtml">Fernando\xc3\x82 Abad</a>*\n</td> \n\n> pitcherlinks[[2]]\n<td class="left " data-append-csv="adlemti01" data-stat="player" csk="Adleman,Tim0.01">\n <a href="/players/a/adlemti01.shtml">Tim\xc3\x82 Adleman</a>\n</td> \nRun Code Online (Sandbox Code Playgroud)\n\n如何提取名称Fernando\xc3\x82 Abad以及相关链接/players/a/abadfe01.shtml
由于您有一个列表,因此使用 apply 函数来遍历该列表。每个函数都使用read_htmlCSS 选择器解析列表中的 hmtl 片段a以查找锚点(链接)。名称来自html_text,链接位于属性中href
library(rvest)\npitcherlinks <- list()\npitcherlinks[[1]] <- \n\'<td class="left " data-append-csv="abadfe01" data-stat="player" csk="Abad,Fernando0.01">\n <a href="/players/a/abadfe01.shtml">Fernando\xc3\x82 Abad</a>*\n </td>\'\n\npitcherlinks[[2]] <- \n \'<td class="left " data-append-csv="adlemti01" data-stat="player" csk="Adleman,Tim0.01">\n <a href="/players/a/adlemti01.shtml">Tim\xc3\x82 Adleman</a>\n </td>\'\n\nnames <- sapply(pitcherlinks, function(x) {x %>% read_html() %>% html_nodes("a") %>% html_text()})\nlinks <- sapply(pitcherlinks, function(x) {x %>% read_html() %>% html_nodes("a") %>% html_attr("href")})\n\nnames\n# [1] "Fernando\xc3\x82 Abad" "Tim\xc3\x82 Adleman" \nlinks\n# [1] "/players/a/abadfe01.shtml" "/players/a/adlemti01.shtml"\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
6818 次 |
| 最近记录: |