我有一个小标题,每行有一个单词列表。我想从一个搜索关键字的函数中创建一个新变量,如果找到该关键字,则创建一个由关键字正负3个单词组成的字符串。
下面的代码是close,但是不是抓住我的关键字之前和之后的所有三个单词,而是抓住单词3前后的单词。
df <- tibble(words = c("it", "was", "the", "best", "of", "times",
"it", "was", "the", "worst", "of", "times"))
df <- df %>% mutate(chunks = ifelse(words=="times",
paste(lag(words, 3),
words,
lead(words, 3), sep = " "),
NA))
Run Code Online (Sandbox Code Playgroud)
最直观的解决方案是该lag函数可以执行以下操作:lead(words, 1:3)但这不起作用。
显然,我可以手动(paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3))很快完成此操作,但实际上,我最终将希望能够抓住50个单词左右的关键字,以至于无法手工编码。
如果tidyverse中存在解决方案,那将是理想的选择,但是任何解决方案都将有所帮助。任何帮助,将不胜感激。
一种选择是sapply:
library(dplyr)
df %>%
mutate(
chunks = ifelse(words == "times",
sapply(1:nrow(.),
function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")),
NA)
)
Run Code Online (Sandbox Code Playgroud)
输出:
# A tibble: 12 x 2
words chunks
<chr> <chr>
1 it NA
2 was NA
3 the NA
4 best NA
5 of NA
6 times the best of times it was the
7 it NA
8 was NA
9 the NA
10 worst NA
11 of NA
12 times the worst of times
Run Code Online (Sandbox Code Playgroud)
尽管不是显式的lead或lag功能性的,但它通常也可以达到目的。