如何将随机的"NA"添加到数据框中

koe*_*bro 8 r apply dataframe

我用随机值创建了一个数据框

n <- 50
df <- data.frame(id = seq (1:n),
age = sample(c(20:90), n, rep = TRUE), 
sex = sample(c("m", "f"), n, rep = TRUE, prob = c(0.55, 0.45))
)
Run Code Online (Sandbox Code Playgroud)

并想介绍一些NA值来模拟真实世界的数据.我试图使用,apply但无法到达那里.这条线

apply(subset(df,select=-id), 2, function(x) {x[sample(c(1:n),floor(n/10))]})
Run Code Online (Sandbox Code Playgroud)

将检索随机值,但是

apply(subset(df,select=-id), 2, function(x) {x[sample(c(1:n),floor(n/10))]<-NA}) 
Run Code Online (Sandbox Code Playgroud)

不会将它们设置为NA.是否尝试过withwithin,太.

蛮力工作:

for (i in (1:floor(n/10))) {
  df[sample(c(1:n), 1), sample(c(2:ncol(df)), 1)] <- NA
  }
Run Code Online (Sandbox Code Playgroud)

但我更愿意使用这个apply家庭.

luk*_*keA 5

x在您的函数中返回:

> df <- apply (df, 2, function(x) {x[sample( c(1:n), floor(n/10))] <- NA; x} )
> tail(df)
      id   age  sex
[45,] "45" "41" NA 
[46,] "46" NA   "f"
[47,] "47" "38" "f"
[48,] "48" "32" "f"
[49,] "49" "53" NA 
[50,] "50" "74" "f"
Run Code Online (Sandbox Code Playgroud)

  • @Gianluca:我的答案是这样做的(但不假设 ID 列是第一个) (2认同)