我有一个如下所示的数据集:
rownum<-c(1,2,3,4,5,6,7,8,9,10)
name<-c("jeff","jeff","mary","jeff","jeff","jeff","mary","mary","mary","mary")
text<-c("a","b","c","d","e","f","g","h","i","j")
a<-data.table(rownum,name,text)
我想添加一个新的文本列,从前一列中添加rownum和name.新列的向量将是:
rolltext<-c("a","ab","c","abd","abde","abdef","cg","cgh","cghi","cghij"
在这方面我无所适从.对于数字我只会使用cumsum函数,但对于文本我认为我需要for循环或使用其中一个apply函数?
您可以使用Reduce与accumulate选项:
a[, rolltext := Reduce(paste0, text, accumulate = TRUE), by = name]
    rownum name text rolltext
 1:      1 jeff    a        a
 2:      2 jeff    b       ab
 3:      3 mary    c        c
 4:      4 jeff    d      abd
 5:      5 jeff    e     abde
 6:      6 jeff    f    abdef
 7:      7 mary    g       cg
 8:      8 mary    h      cgh
 9:      9 mary    i     cghi
10:     10 mary    j    cghij
或者,正如@DavidArenburg建议的那样,使用sapply以下方法构造每一行:
a[, rolltext := sapply(1:.N, function(x) paste(text[1:x], collapse = '')), by = name]
这是一个运行总和,而滚动总和(在OP的标题中)是不同的,至少在R lingo中.
这是一个使用的想法substring().
a[, rolltext := substring(paste(text, collapse = ""), 1, 1:.N), by = name]
这使
    rownum name text rolltext
 1:      1 jeff    a        a
 2:      2 jeff    b       ab
 3:      3 mary    c        c
 4:      4 jeff    d      abd
 5:      5 jeff    e     abde
 6:      6 jeff    f    abdef
 7:      7 mary    g       cg
 8:      8 mary    h      cgh
 9:      9 mary    i     cghi
10:     10 mary    j    cghij
我们或许可以使用stringi包来加快速度
library(stringi)
a[, rolltext := stri_sub(stri_c(text, collapse = ""), length = 1:.N), by = name]