我有data.table200多个变量都是二进制的.我想在其中创建一个新列,用于计算每行和参考向量之间的差异:
#Example
dt = data.table(
"V1" = c(1,1,0,1,0,0,0,1,0,1,0,1,1,0,1,0),
"V2" = c(0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0),
"V3" = c(0,0,0,1,1,1,1,0,1,0,1,0,1,0,1,0),
"V4" = c(1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0),
"V5" = c(1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0)
)
reference = c(1,1,0,1,0)
Run Code Online (Sandbox Code Playgroud)
我可以用一个小的for循环,比如
distance = NULL
for(i in 1:nrow(dt)){
distance[i] = sum(reference != dt[i,])
}
Run Code Online (Sandbox Code Playgroud)
但它有点慢,肯定不是最好的方法.我试过了:
dt[,"distance":= sum(reference != c(V1,V2,V3,V4,V5))]
dt[,"distance":= sum(reference != .SD)]
Run Code Online (Sandbox Code Playgroud)
但它们都不起作用,因为它们为所有行返回相同的值.此外,我不必键入所有变量名称的解决方案会好得多,因为真正的data.table有超过200列
我有2个数据表
运动:
library(data.table)
consEx = data.table(
begin = as.POSIXct(c("2019-04-01 00:00:10"," 2019-04-07 10:00:00","2019-04-10 23:00:00","2019-04-12 20:00:00","2019-04-15 10:00:00",
"2019-04-20 10:00:00","2019-04-22 13:30:00","2019-04-10 15:30:00","2019-04-12 21:30:00","2019-04-15 20:00:00")),
end = as.POSIXct(c("2019-04-01 20:00:00","2019-04-07 15:00:00","2019-04-11 10:00:00", "2019-04-12 23:30:00","2019-04-15 15:00:00",
"2019-04-21 12:00:00","2019-04-22 17:30:00","2019-04-10 20:00:00","2019-04-13 05:00:00", "2019-04-15 12:30:00")),
carId = c(1,1,1,2,2,3,3,4,4,5),
tripId = c(1:10)
)
Run Code Online (Sandbox Code Playgroud)
和警报:
alertsEx = data.table(
timestamp = as.POSIXct(c("2019-04-01 10:00:00","2019-04-01 10:30:00","2019-04-01 15:00:00","2019-04-15 13:00:00","2019-04-22 14:00:00",
"2019-04-22 15:10:00","2019-04-22 15:40:00","2019-04-10 16:00:00","2019-04-10 17:00:00","2019-04-13 04:00:00")),
type = c("T1","T2","T1",'T3',"T1","T1","T3","T2","T2","T1"),
carId = c(1,1,1,2,3,3,3,4,4,4),
additionalInfo1 = rnorm(10,mean=10,sd=4)
)
Run Code Online (Sandbox Code Playgroud)
运动表记录了一个周期begin- end汽车在该周期内运动。警报表显示在警报发生在汽车,包含type, …
最近我看到了一个类似这样的问题(找不到链接)
我想在data.frame上添加一列,用于计算不同列的方差,同时删除当前观察.
dt = data.table(
id = c(1:13),
v = c(9,5,8,1,25,14,7,87,98,63,32,12,15)
)
Run Code Online (Sandbox Code Playgroud)
所以,for()循环:
res = NULL
for(i in 1:13){
res[i] = var(dt[-i,v])
}
Run Code Online (Sandbox Code Playgroud)
我尝试在data.table中使用负索引来执行此操作.I,但令我惊讶的是,以下所有工作都没有:
#1
dt[,var := var(dt[,v][-.I])]
#2
dt[,var := var(dt$v[-.I])]
#3
fun = function(x){
v = c(9,5,8,1,25,14,7,87,98,63,32,12,15)
var(v[-x])
}
dt[,var := fun(.I)]
#4
fun = function(x){
var(dt[-x,v])
}
dt[,var := fun(.I)]
Run Code Online (Sandbox Code Playgroud)
所有这些都给出了相同的输出:
id v var
1: 1 9 NA
2: 2 5 NA
3: 3 8 NA
4: 4 1 NA
5: 5 25 …Run Code Online (Sandbox Code Playgroud)