如果每行确切缺少N个值,则用行表示替换缺失值

Question

如果每行确切缺少N个值,则用行表示替换缺失值

我有一个数据矩阵,每行有不同数量的缺失值.我想要的是如果每行缺失值的数量是N(假设为1),则用行表示替换缺失值.

我已经为这个问题创建了一个解决方案,但它是一个非常不优雅的解决方案,所以我正在寻找其他的东西.

我的解决方案

#SAMPLE DATA

a <- c(rep(c(1:4, NA), 2))
b <- c(rep(c(1:3, NA, 5), 2))
c <- c(rep(c(1:3, NA, 5), 2))

df <- as.matrix(cbind(a,b,c), ncol = 3, nrow = 10)

#CALCULATING THE NUMBER OF MISSING VALUES PER ROW

miss_row <- rowSums(apply(as.matrix(df), c(1,2), function(x) {
  sum(is.na(x)) +
  sum(x == "", na.rm=TRUE)
}) )

df <- cbind(df, miss_row)

#CALCULATING THE ROW MEANS FOR ROWS WITH 1 MISSING VALUE

row_mean <- ifelse(df[,4] == 1, rowMeans(df[,1:3], na.rm = TRUE), NA)

df <- cbind(df, row_mean)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Cat*_*ath 5

这是我在评论中提到的方式,有更多细节:

# create your matrix
df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there

# Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
nb_NA_row <- rowSums(is.na(df))

# Replace missing values row-wise by the row mean when there is N NA in the row
N <- 1 # the given example
df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]

# check df

df
#      a  b  c
# [1,] 1  1  1
# [2,] 2  2  2
# [3,] 3  3  3
# [4,] 4 NA NA
# [5,] 5  5  5
# [6,] 1  1  1
# [7,] 2  2  2
# [8,] 3  3  3
# [9,] 4 NA NA
#[10,] 5  5  5

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，7 月前
查看次数：	108 次
最近记录：	7 年，7 月前