在管道中的同一对象上调用两个不同的函数(%>%)

lik*_*zza 1 r magrittr

我想知道是否有一种方法可以同时调用html_name()html_text(从rvest包中)并从同一管道(magrittr::%>%)中存储两个不同的结果

这是一个例子:

uniprot_ac <- "P31374"

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
    content(as = "raw", content = "text/xml") %>%
    read_html %>%
    html_nodes(xpath = '//recommendedname/* |
               //name[@type="primary"] | //comment[@type="function"]/text |
               //comment[@type="interaction"]/text')
Run Code Online (Sandbox Code Playgroud)

在这一点上,我想从 html_name()

[1] "fullname" "ecnumber" "name"     "text"    
Run Code Online (Sandbox Code Playgroud)

AND标签内容,而不必通过重写整个管道以将最后一行更改为来创建单独的对象 html_text()

[1] "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
[2] "2.7.11.1"                                                                                                                                                                                                                                                                                                         
[3] "PSK1"                                                                                                                                                                                                                                                                                                             
[4] "Serine/threonine-protein kinase involved ... ... 
Run Code Online (Sandbox Code Playgroud)

所需的输出可以是这样的,矢量或数据。帧都没有关系

  [1] fullname: "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
  [2] ecnumber: "2.7.11.1"                                                                                                                                                                                                                                                                                                         
  [3] Name: "PSK1"                                                                                                                                                                                                                                                                                                             
  [4] Text: "Serine/threonine-protein kinase involved ... ... 
Run Code Online (Sandbox Code Playgroud)

And*_*rau 5

也许有点hack,但是您可以在管道中使用带括号的匿名函数:

library("magrittr")
library("httr")
library("xml2")
library("rvest")

uniprot_ac <- "P31374"

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
  content(as = "raw", content = "text/xml") %>%
  read_html %>%
  html_nodes(xpath = '//recommendedname/* |
             //name[@type="primary"] | //comment[@type="function"]/text |
             //comment[@type="interaction"]/text') %>% 
  (function(x) list(name = html_name(x), text = html_text(x)))
#$name
#[1] "fullname" "ecnumber" "name"     "text"    
#
#$text
#[1] "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
#[2] "2.7.11.1"                                                                                                                                                                                                                                                                                                         
#[3] "PSK1"                                                                                                                                                                                                                                                                                                             
#[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."
Run Code Online (Sandbox Code Playgroud)

另外,您也许可以使用该purrr软件包做些更优雅的事情,但是我看不出为什么要为此加载整个软件包的原因。

编辑 正如@MrFlick在注释中指出的那样,.如果正确地将花括号放在大括号中,则点()占位符可以执行相同的操作。

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
  content(as = "raw", content = "text/xml") %>%
  read_html %>%
  html_nodes(xpath = '//recommendedname/* |
             //name[@type="primary"] | //comment[@type="function"]/text |
             //comment[@type="interaction"]/text') %>% 
  {list(name = html_name(.), text = html_text(.))}
Run Code Online (Sandbox Code Playgroud)

这无疑是做的更magrittr-习惯的方法,它在实际记录help("%>%")

  • 我是否缺少某些东西,或者为什么人们不推荐`{list(name = html_name(。),text = html_text(。))}}`。您实际上并不需要功能。 (3认同)