我有这个data.table有一些特定于组的数据,以及一些一般数据:
group year flow agg
1: 51557094 2010 3.46000 592649.6
2: 51557133 1999 111.60000 522706.2
3: 51557133 2000 29.36000 555279.7
4: 51557133 2003 96.38000 592649.6
5: 51557193 2004 65.22000 550622.4
Run Code Online (Sandbox Code Playgroud)
flow这里是group- year特异性的,agg是year具体的.我想计算第一个差异:对于flow基于group,和第一个差异year,并且agg没有分组,只是第一个差分year.
我更喜欢不包括的方法dplyr.
group year dFlow dAgg
1: 51557094 2010 NA NA
2: 51557133 1999 NA NA
3: 51557133 2000 -82.24 32573.5
4: 51557133 2003 NA NA
5: 51557193 2004 NA -42027.2
Run Code Online (Sandbox Code Playgroud)
你可以试试
library(data.table)
myDataTable[, ind:= 1:.N][order(year)][seq_len(.N) %in% 1:2,
dFlow:=c(NA, diff(flow)) , by = group][,
dAgg:= c(NA, diff(agg)), cumsum(c(TRUE, diff(year)!=1))][
order(ind)][,3:5 := NULL][]
# group year dFlow dAgg
#1: 51557094 2010 NA NA
#2: 51557133 1999 NA NA
#3: 51557133 2000 -82.24 32573.5
#4: 51557133 2003 NA NA
#5: 51557193 2004 NA -42027.2
Run Code Online (Sandbox Code Playgroud)
df2 <- structure(list(group = c(51557094L, 51557133L, 51557133L,
51557133L,
51557193L), year = c(2010L, 1999L, 2000L, 2003L, 2004L),
flow = c(3.46,
111.6, 29.36, 96.38, 65.22), agg = c(592649.6, 522706.2, 555279.7,
592649.6, 550622.4)), .Names = c("group", "year", "flow", "agg"
), class = "data.frame", row.names = c("1:", "2:", "3:", "4:",
"5:"))
myDataTable <- as.data.table(df2)
Run Code Online (Sandbox Code Playgroud)