我有一个数据集,格式类似于:
amount | event
------ | ------
3 | FALSE
4 | FALSE
6 | TRUE
7 | FALSE
3 | FALSE
4 | TRUE
8 | FALSE
Run Code Online (Sandbox Code Playgroud)
并且希望基于event列的值进行拆分和变异,并且仅当值为eventTRUE时,才创建使用行前后的值填充的新列.例如:
amount | event | before | after
------ | ----- | ----- | -----
3 | FALSE | NA | NA
4 | FALSE | NA | NA
6 | TRUE | 4 | 7
7 | FALSE | NA | NA
3 | FALSE | NA | NA
4 | TRUE | 3 | 8
8 | FALSE | NA | NA
Run Code Online (Sandbox Code Playgroud)
我在想ddply有mutate,但不知道如何根据拆分后的偏移值访问.有任何想法吗?
使用base R,我们TRUE在'event'列中找到值的位置which('indx'),创建两个NA列('before'和'after'),然后我们分配位于下方位置1的'amount'值1'在'indx'之上,'之前'和'之后'列
indx <- which(df1$event)
df1[c('before','after')] <- NA
df1$before[indx] <- df1$amount[indx-1]
df1$after[indx] <- df1$amount[indx+1]
df1
# amount event before after
#1 3 FALSE NA NA
#2 4 FALSE NA NA
#3 6 TRUE 4 7
#4 7 FALSE NA NA
#5 3 FALSE NA NA
#6 4 TRUE 3 8
#7 8 FALSE NA NA
Run Code Online (Sandbox Code Playgroud)
或使用data.table(类似于@Marat Talipov的想法),我们可以使用shift来获取lag和lead"量"的值来创建"前/后"的列.我们将与FALSE'event'(!event)中的值对应的那些列中的行更改为NA.
library(data.table)#data.table_1.9.5
setDT(df1)[,c('before', 'after'):= list(shift(amount, type='lag'),
shift(amount, type='lead')) ][(!event), 3:4 := NA][]
# amount event before after
#1: 3 FALSE NA NA
#2: 4 FALSE NA NA
#3: 6 TRUE 4 7
#4: 7 FALSE NA NA
#5: 3 FALSE NA NA
#6: 4 TRUE 3 8
#7: 8 FALSE NA NA
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(amount = c(3L, 4L, 6L, 7L, 3L, 4L, 8L),
event = c(FALSE,
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE)), .Names = c("amount",
"event"), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)