我正在尝试添加新列data.table,其中行中的值取决于行中值的相对关系.更确切地说,如果一行中有一个值X,我想知道在X-30中有多少其他值在同一列(和组)中.
就是这样:
DT<-data.table(
X = c(1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1),
Y = c(100, 101, 133, 134, 150, 156, 190, 200, 201, 230, 233, 234),
Z = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))
Run Code Online (Sandbox Code Playgroud)
我想获得一个新列,其值为:
N <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 2)
Run Code Online (Sandbox Code Playgroud)
我尝试了以下内容,但我没有得到我可以使用的结果:
DT[,list(Y,num=cumsum(Y[-.I]>DT[.I,Y]-30),Z),by=.(X)]
Run Code Online (Sandbox Code Playgroud)
任何想法如何做到这一点?
这可能是通过滚动连接(?)实现的,但foverlaps现在这是一个替代方案
DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] # Add row index and a -30 interval
setkey(DT, X, Y2, Y) # Sort by X and the intervals (for fovelaps)
res <- foverlaps(DT, DT)[Y2 > i.Y2, .N, keyby = indx] # Run foverlaps and check what can we catch
setorder(DT, indx) # go back to the original order
DT[res$indx, N := res$N][, c("indx", "Y2") := NULL] # update results and remove cols
DT
# X Y Z N
# 1: 1 100 1 0
# 2: 2 101 2 0
# 3: 2 133 3 0
# 4: 1 134 4 0
# 5: 1 150 5 1
# 6: 2 156 6 1
# 7: 1 190 7 0
# 8: 2 200 8 0
# 9: 2 201 9 1
# 10: 1 230 10 0
# 11: 1 233 11 1
# 12: 1 234 12 2
Run Code Online (Sandbox Code Playgroud)
或者,使用which=TRUE选项foverlaps使重叠合并更小:
# as above
DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)]
setkey(DT, X, Y2, Y)
# using which=TRUE:
res <- foverlaps(DT, DT, which=TRUE)[xid > yid, .N, by=xid]
DT[res$xid, N := res$N]
setorder(DT, indx)
DT[, c("Y2","indx") := NULL]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
378 次 |
| 最近记录: |