需要获得 R cummax 但正确处理 NA

Question

需要获得 R cummax 但正确处理 NA

我有一个像这样的数据框：

dput(df1)
structure(list(x = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), y = c(16449L, NA, NA, 
16449L, 16450L, 16451L, NA, NA, 16455L, 16456L, NA, NA, 16756L, 
NA, 16460L, 16464L, 16469L, NA, NA, 16469L)), .Names = c("x", 
"y"), row.names = c(NA, -20L), class = "data.frame")

Run Code Online (Sandbox Code Playgroud)

我需要y按如下方式改变列（使用dplyr）：

df1 <- mutate(df1, y = ifelse(is.na(y), cummax(y), y))

Run Code Online (Sandbox Code Playgroud)

但是，cummax不适合我的情况处理 NA。如何通过其他方法获得相同的效果？

结果输出应将 NA 行y填充为的最后一个非 NA 值y。它们是按顺序排列的。

或者，我尝试了类似的方法，但它不起作用：

mutate(df1, y = ifelse(is.na(y), max(y[1:row_number()], na.rm = TRUE), y)

Run Code Online (Sandbox Code Playgroud)

因为 row_number() 本身是一个从 1 到当前行的向量，所以它会产生错误。

编辑：所需的输出如下：

structure(list(x = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), y = c(16449, 16449, 
16449, 16449, 16450, 16451, 16451, 16451, 16455, 16456, 16456, 
16456, 16756, 16756, 16460, 16464, 16469, 16756, 16756, 16469
)), class = "data.frame", .Names = c("x", "y"), row.names = c(NA, 
-20L))

Run Code Online (Sandbox Code Playgroud)

Answer 1

Col*_*vel 6

你可以做：

library(dplyr)

v = cummax(ifelse(is.na(df1$y), -Inf, df1$y))  #A. Webb suggested -Inf instead of 0, great!

mutate(df1, y=ifelse(is.na(y), v, y))

#    x     y
#1   1 16449
#2   2 16449
#3   3 16449
#4   4 16449
#5   5 16450
#6   6 16451
#7   7 16451
#8   8 16451
#9   9 16455
#10 10 16456
#11  1 16456
#12  2 16456
#13  3 16756
#14  4 16756
#15  5 16460
#16  6 16464
#17  7 16469
#18  8 16756
#19  9 16756
#20 10 16469

Run Code Online (Sandbox Code Playgroud)

或者你可以使用data.table：

setDT(transform(df1,ix=1:nrow(df1)))[,max(df1$y[1:ix],na.rm=T),by=.(ix)]

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年前
查看次数：	1308 次
最近记录：	2 年，6 月前