data.table:用于特定列组合的"组计数器"

Mar*_*ine 3 r data.table

我想基于一组相同的行在数据框中添加一个计数器列.为此,我使用了包data.table.在我的例子中,行之间的比较需要从列"z"AND("x"OR"y")的组合中进行.

我测试过:

DF[ , Index := .GRP, by = c("x","y","z") ]
Run Code Online (Sandbox Code Playgroud)

但结果是"z"和"x"与"y"的组合.

如何组合"z"AND("x"或"y")?

这是一个数据示例:

DF = data.frame(x=c("a","a","a","b","c","d","e","f","f"), y=c(1,3,2,8,8,4,4,6,0), z=c("M","M","M","F","F","M","M","F","F"))
DF <- data.table(DF)
Run Code Online (Sandbox Code Playgroud)

我想有这个输出:

> DF
   x y z Index
1: a 1 M   1
2: a 3 M   1
3: a 2 M   1
4: b 8 F   2
5: c 8 F   2
6: d 4 M   3
7: e 4 M   3
8: f 6 F   4
9: f 0 F   4
Run Code Online (Sandbox Code Playgroud)

djh*_*rio 6

新组开始,如果值z正在改变两者的值x y正在发生变化.

试试这个例子.

require(data.table)

DF <- data.table(x = c("a","a","a","b","c","d","e","f","f"),
                 y = c(1,3,2,8,8,4,4,6,0),
                 z=c("M","M","M","F","F","M","M","F","F"))

# The functions to compare if value is not equal with the previous value
is.not.eq.with.lag <- function(x) c(T, tail(x, -1) != head(x, -1))

DF[, x1 := is.not.eq.with.lag(x)]
DF[, y1 := is.not.eq.with.lag(y)]
DF[, z1 := is.not.eq.with.lag(z)]
DF

DF[, Index := cumsum(z1 | (x1 & y1))]
DF
Run Code Online (Sandbox Code Playgroud)