删除列中项目的百分比

Question

删除列中项目的百分比

我试图删除存在超过 90% 的 NA 值的列，我遵循了以下内容，但我只得到一个值作为回报，不确定我做错了什么。我期待一个实际的数据框，我尝试将 as.data.frame 放在前面，但这也是错误的。

示例 DF

gene cell1 cell2 cell3 
A    0.4   0.1   NA
B    NA    NA    0.1
C    0.4   NA    0.5
D    NA    NA    0.5
E    0.5   NA    0.6
F    0.6   NA    NA

Run Code Online (Sandbox Code Playgroud)

所需DF

gene cell1  cell3 
A    0.4     NA
B    NA      0.1
C    0.4     0.5
D    NA      0.5
E    0.5     0.6
F    0.6     NA

Run Code Online (Sandbox Code Playgroud)

代码

#Select Genes that have NA values for 90% of a given cell line
df_col <- df[,2:ncol(df)]
df_col <-df_col[, which(colMeans(!is.na(df_col)) > 0.9)]
df <- cbind(df[,1], df_col)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Gue*_*sBF 5

我会dplyr在这里使用。

如果您想使用select()与逻辑条件，你可能寻找where()在选择助手dplyr。它可以像这样使用：select(where(condition))

我使用了 80% 的阈值，因为 90% 会保留所有列，因此也不会说明解决方案

library(dplyr)

df %>% select(where(~mean(is.na(.))<0.8))

Run Code Online (Sandbox Code Playgroud)

也可以使用 base R 和 colMeans 来完成：

df[, c(TRUE, colMeans(is.na(df[-1]))<0.8)]

Run Code Online (Sandbox Code Playgroud)

或发出呼噜声：

library(purrr)

df %>% keep(~mean(is.na(.))<0.8)

Run Code Online (Sandbox Code Playgroud)

输出：

  gene cell1 cell3
1    a   0.4    NA
2    b    NA   0.1
3    c   0.4   0.5
4    d    NA   0.5
5    e   0.5   0.6
6    f   0.6    NA

Run Code Online (Sandbox Code Playgroud)

数据

df<-data.frame(gene=letters[1:6],
cell1=c(0.4, NA, 0.4, NA, 0.5, 0.6),
cell2=c(0.1, rep(NA, 5)),
cell3=c(NA, 0.1, 0.5, 0.5, 0.6, NA))

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，3 月前
查看次数：	67 次
最近记录：	4 年，3 月前