我想知道是否有一种方法可以同时调用html_name()和html_text(从rvest包中)并从同一管道(magrittr::%>%)中存储两个不同的结果
这是一个例子:
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text')
Run Code Online (Sandbox Code Playgroud)
在这一点上,我想从 html_name()
[1] "fullname" "ecnumber" "name" "text"
Run Code Online (Sandbox Code Playgroud)
AND标签内容,而不必通过重写整个管道以将最后一行更改为来创建单独的对象 html_text()
[1] "Serine/threonine-protein kinase PSK1"
[2] "2.7.11.1"
[3] "PSK1"
[4] "Serine/threonine-protein kinase involved ... ...
Run Code Online (Sandbox Code Playgroud)
所需的输出可以是这样的,矢量或数据。帧都没有关系
[1] fullname: "Serine/threonine-protein kinase PSK1"
[2] ecnumber: "2.7.11.1"
[3] Name: "PSK1"
[4] Text: "Serine/threonine-protein kinase involved ... ...
Run Code Online (Sandbox Code Playgroud)
也许有点hack,但是您可以在管道中使用带括号的匿名函数:
library("magrittr")
library("httr")
library("xml2")
library("rvest")
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
(function(x) list(name = html_name(x), text = html_text(x)))
#$name
#[1] "fullname" "ecnumber" "name" "text"
#
#$text
#[1] "Serine/threonine-protein kinase PSK1"
#[2] "2.7.11.1"
#[3] "PSK1"
#[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."
Run Code Online (Sandbox Code Playgroud)
另外,您也许可以使用该purrr软件包做些更优雅的事情,但是我看不出为什么要为此加载整个软件包的原因。
编辑
正如@MrFlick在注释中指出的那样,.如果正确地将花括号放在大括号中,则点()占位符可以执行相同的操作。
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
{list(name = html_name(.), text = html_text(.))}
Run Code Online (Sandbox Code Playgroud)
这无疑是做的更magrittr-习惯的方法,它是在实际记录help("%>%")。