在带有向量元素的 tibbles 上使用 dplyr 时出现问题 [列表列]

Question

在带有向量元素的 tibbles 上使用 dplyr 时出现问题 [列表列]

我在使用 dplyr 和 stringr 函数（特别是 str_split()）进行文本处理时遇到了一些问题。我认为我误解了在处理向量/列表元素时如何正确使用 dplyr 的一些非常基本的内容。

这是一个小问题，df ...

library(tidyverse)

df <- tribble(
  ~item, ~phrase,
  "one",   "romeo and juliet",
  "two",   "laurel and hardy",
  "three", "apples and oranges and pears and peaches"
)

Run Code Online (Sandbox Code Playgroud)

现在，我通过使用“和”作为分隔符在其中一列上执行str_split()来创建一个新列splitPhrase 。

df <- df %>% mutate(splitPhrase = str_split(phrase,"and"))
Run Code Online (Sandbox Code Playgroud)
这似乎可行，在 RStudio 中我看到了这个......

在控制台中，我看到我的新列 splitPhrase 实际上由列表组成...但它在 Rstudio 显示中看起来是正确的，对吧？

df #> # A tibble: 3 x 3 #> item phrase splitPhrase #> <chr> <chr> <list> #> 1 one romeo and juliet <chr [2]> #> 2 two laurel and hardy <chr [2]> #> 3 three apples and oranges and pears and peaches <chr [4]>
Run Code Online (Sandbox Code Playgroud)
我最终想做的是提取每个 splitPhrase 的最后一项。换句话说，我想达到这个目的......

问题是我不知道如何获取每个 splitPhrase 中的最后一个元素。如果它只是一个向量，我可以做这样的事情......

#> last( c("a","b","c") ) #[1] "c" #>
Run Code Online (Sandbox Code Playgroud)
但这在小标题中不起作用，想到的其他事情也不起作用：

df <- df %>% mutate(lastThing = last(splitPhrase)) # Error in mutate_impl(.data, dots) : # Column `lastThing` must be length 3 (the number of rows) or one, not 4 df <- df %>% group_by(splitPhrase) %>% mutate(lastThing = last(splitPhrase)) # Error in grouped_df_impl(data, unname(vars), drop) : # Column `splitPhrase` can't be used as a grouping variable because it's a list
Run Code Online (Sandbox Code Playgroud)
所以，我认为我“不明白”如何使用 table/tibble 列中元素内的向量。这似乎与我的示例中它实际上是向量列表这一事实有关。

是否有特定的功能可以帮助我，或者有更好的方法来实现这一点？

^{由reprex 包(v0.2.1)于 2018-09-27 创建}

Answer 1

akr*_*run 2

'splitPhrase' 列是 a list，因此我们循环遍历list以获取元素

library(tidyverse)
df %>% 
   mutate(splitPhrase = str_split(phrase,"\\s*and\\s*"),
          Last = map_chr(splitPhrase, last)) %>%
   select(item, Last)

Run Code Online (Sandbox Code Playgroud)

但是，它可以通过多种方式来完成。使用separate_rows，展开列，然后获取last按“item”分组的元素

df %>% 
  separate_rows(phrase,sep = " and ") %>% 
  group_by(item) %>% 
  summarise(Last = last(phrase))

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，4 月前
查看次数：	2071 次
最近记录：	7 年，4 月前