我有一个如下所示的数据集:
rownum<-c(1,2,3,4,5,6,7,8,9,10)
name<-c("jeff","jeff","mary","jeff","jeff","jeff","mary","mary","mary","mary")
text<-c("a","b","c","d","e","f","g","h","i","j")
a<-data.table(rownum,name,text)
Run Code Online (Sandbox Code Playgroud)
我想添加一个新的文本列,从前一列中添加rownum和name.新列的向量将是:
rolltext<-c("a","ab","c","abd","abde","abdef","cg","cgh","cghi","cghij"
Run Code Online (Sandbox Code Playgroud)
在这方面我无所适从.对于数字我只会使用cumsum函数,但对于文本我认为我需要for循环或使用其中一个apply函数?
您可以使用Reduce与accumulate选项:
a[, rolltext := Reduce(paste0, text, accumulate = TRUE), by = name]
rownum name text rolltext
1: 1 jeff a a
2: 2 jeff b ab
3: 3 mary c c
4: 4 jeff d abd
5: 5 jeff e abde
6: 6 jeff f abdef
7: 7 mary g cg
8: 8 mary h cgh
9: 9 mary i cghi
10: 10 mary j cghij
Run Code Online (Sandbox Code Playgroud)
或者,正如@DavidArenburg建议的那样,使用sapply以下方法构造每一行:
a[, rolltext := sapply(1:.N, function(x) paste(text[1:x], collapse = '')), by = name]
Run Code Online (Sandbox Code Playgroud)
这是一个运行总和,而滚动总和(在OP的标题中)是不同的,至少在R lingo中.
这是一个使用的想法substring().
a[, rolltext := substring(paste(text, collapse = ""), 1, 1:.N), by = name]
Run Code Online (Sandbox Code Playgroud)
这使
rownum name text rolltext
1: 1 jeff a a
2: 2 jeff b ab
3: 3 mary c c
4: 4 jeff d abd
5: 5 jeff e abde
6: 6 jeff f abdef
7: 7 mary g cg
8: 8 mary h cgh
9: 9 mary i cghi
10: 10 mary j cghij
Run Code Online (Sandbox Code Playgroud)
我们或许可以使用stringi包来加快速度
library(stringi)
a[, rolltext := stri_sub(stri_c(text, collapse = ""), length = 1:.N), by = name]
Run Code Online (Sandbox Code Playgroud)