超前或滞后函数可获取多个值,而不仅仅是第n个

wsc*_*ell 10 r lag lead dplyr

我有一个小标题,每行有一个单词列表。我想从一个搜索关键字的函数中创建一个新变量,如果找到该关键字,则创建一个由关键字正负3个单词组成的字符串。

下面的代码是close,但是不是抓住我的关键字之前和之后的所有三个单词,而是抓住单词3前后的单词。

df <- tibble(words = c("it", "was", "the", "best", "of", "times", 
                       "it", "was", "the", "worst", "of", "times"))
df <- df %>% mutate(chunks = ifelse(words=="times", 
                                    paste(lag(words, 3), 
                                          words, 
                                          lead(words, 3), sep = " "),
                                    NA))
Run Code Online (Sandbox Code Playgroud)

最直观的解决方案是该lag函数可以执行以下操作:lead(words, 1:3)但这不起作用。

显然,我可以手动(paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3))很快完成此操作,但实际上,我最终将希望能够抓住50个单词左右的关键字,以至于无法手工编码。

如果tidyverse中存在解决方案,那将是理想的选择,但是任何解决方案都将有所帮助。任何帮助,将不胜感激。

arg*_*t91 7

一种选择是sapply

library(dplyr)

df %>%
  mutate(
    chunks = ifelse(words == "times",
                    sapply(1:nrow(.), 
                       function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")),
                    NA)
  )
Run Code Online (Sandbox Code Playgroud)

输出:

# A tibble: 12 x 2
   words chunks                      
   <chr> <chr>                       
 1 it    NA                          
 2 was   NA                          
 3 the   NA                          
 4 best  NA                          
 5 of    NA                          
 6 times the best of times it was the
 7 it    NA                          
 8 was   NA                          
 9 the   NA                          
10 worst NA                          
11 of    NA                          
12 times the worst of times   
Run Code Online (Sandbox Code Playgroud)

尽管不是显式的leadlag功能性的,但它通常也可以达到目的。