小编Fin*_*ino的帖子

在data.table中创建新列时,如何引用整行?

我有data.table200多个变量都是二进制的.我想在其中创建一个新列,用于计算每行和参考向量之间的差异:

#Example
dt = data.table(
"V1" = c(1,1,0,1,0,0,0,1,0,1,0,1,1,0,1,0),
"V2" = c(0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0),
"V3" = c(0,0,0,1,1,1,1,0,1,0,1,0,1,0,1,0),
"V4" = c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0),
"V5" = c(1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0)  
)

reference = c(1,1,0,1,0)
Run Code Online (Sandbox Code Playgroud)

我可以用一个小的for循环,比如

distance = NULL
for(i in 1:nrow(dt)){      
  distance[i] = sum(reference != dt[i,])  
}
Run Code Online (Sandbox Code Playgroud)

但它有点慢,肯定不是最好的方法.我试过了:

dt[,"distance":= sum(reference != c(V1,V2,V3,V4,V5))]
dt[,"distance":= sum(reference != .SD)]
Run Code Online (Sandbox Code Playgroud)

但它们都不起作用,因为它们为所有行返回相同的值.此外,我不必键入所有变量名称的解决方案会好得多,因为真正的data.table有超过200列

r data.table

7
推荐指数
2
解决办法
139
查看次数

连接2个数据表同时汇总其中一个数据的最快方法

问题

我有2个数据表

运动:

  library(data.table)

  consEx = data.table(
  begin = as.POSIXct(c("2019-04-01 00:00:10"," 2019-04-07 10:00:00","2019-04-10 23:00:00","2019-04-12 20:00:00","2019-04-15 10:00:00",
                       "2019-04-20 10:00:00","2019-04-22 13:30:00","2019-04-10 15:30:00","2019-04-12 21:30:00","2019-04-15 20:00:00")),

  end = as.POSIXct(c("2019-04-01 20:00:00","2019-04-07 15:00:00","2019-04-11 10:00:00", "2019-04-12 23:30:00","2019-04-15 15:00:00",
                     "2019-04-21 12:00:00","2019-04-22 17:30:00","2019-04-10 20:00:00","2019-04-13 05:00:00", "2019-04-15 12:30:00")),

  carId = c(1,1,1,2,2,3,3,4,4,5),
  tripId = c(1:10)
)
Run Code Online (Sandbox Code Playgroud)

和警报:

alertsEx = data.table(
  timestamp = as.POSIXct(c("2019-04-01 10:00:00","2019-04-01 10:30:00","2019-04-01 15:00:00","2019-04-15 13:00:00","2019-04-22 14:00:00",
                "2019-04-22 15:10:00","2019-04-22 15:40:00","2019-04-10 16:00:00","2019-04-10 17:00:00","2019-04-13 04:00:00")),
  type = c("T1","T2","T1",'T3',"T1","T1","T3","T2","T2","T1"),
  carId = c(1,1,1,2,3,3,3,4,4,4),
  additionalInfo1 = rnorm(10,mean=10,sd=4)
)
Run Code Online (Sandbox Code Playgroud)

运动表记录了一个周期begin- end汽车在该周期内运动。警报表显示在警报发生在汽车,包含type, …

r data.table

4
推荐指数
1
解决办法
201
查看次数

为什么我不能在data.table中使用.I删除当前观察?

最近我看到了一个类似这样的问题(找不到链接)

我想在data.frame上添加一列,用于计算不同列的方差,同时删除当前观察.

dt = data.table(
  id = c(1:13),
  v = c(9,5,8,1,25,14,7,87,98,63,32,12,15)
)
Run Code Online (Sandbox Code Playgroud)

所以,for()循环:

res = NULL
for(i in 1:13){
  res[i] = var(dt[-i,v])
}
Run Code Online (Sandbox Code Playgroud)

我尝试在data.table中使用负索引来执行此操作.I,但令我惊讶的是,以下所有工作都没有:

#1
dt[,var := var(dt[,v][-.I])]

#2
dt[,var := var(dt$v[-.I])]

#3 
fun = function(x){
  v = c(9,5,8,1,25,14,7,87,98,63,32,12,15)
  var(v[-x])
}
dt[,var := fun(.I)]

#4
fun = function(x){
  var(dt[-x,v])
}
dt[,var := fun(.I)]
Run Code Online (Sandbox Code Playgroud)

所有这些都给出了相同的输出:

    id  v var
 1:  1  9  NA
 2:  2  5  NA
 3:  3  8  NA
 4:  4  1  NA
 5:  5 25 …
Run Code Online (Sandbox Code Playgroud)

r data.table

2
推荐指数
1
解决办法
68
查看次数

标签 统计

data.table ×3

r ×3