我有一个数据框"df"如下:
Samples Status last_contact_days_to death_days_to
Sample1 Alive [Not Available] [Not Applicable]
Sample2 Dead [Not Available] 724
Sample3 Dead [Not Available] 1624
Sample4 Alive 1569 [Not Applicable]
Sample5 Dead [Not Available] 2532
Sample6 Dead [Not Available] 1271
Run Code Online (Sandbox Code Playgroud)
我想组合列last_contact_days_to和death_days_to结果中的位置,它应该只显示值而不是任何其他字符.如果两列都有字符,则应删除整行.
结果应如下所示:
Samples Status new_column
Sample2 Dead 724
Sample3 Dead 1624
Sample4 Alive 1569
Sample5 Dead 2532
Sample6 Dead 1271
Run Code Online (Sandbox Code Playgroud)
我们可以改变的[Not Available],并[Not Applicable]以NA与使用coalesce
library(tidyverse)
df1 %>%
mutate_at(3:4,
funs(replace(., .%in% c("[Not Available]", "[Not Applicable]"), NA))) %>%
transmute(Samples, Status,
new_column = coalesce(last_contact_days_to, death_days_to)) %>%
filter(!is.na(new_column))
# Samples Status new_column
#1 Sample2 Dead 724
#2 Sample3 Dead 1624
#3 Sample4 Alive 1569
#4 Sample5 Dead 2532
#5 Sample6 Dead 1271
Run Code Online (Sandbox Code Playgroud)
注意:正如@Roland建议的那样,如果第3列和第4列除了'[Not Available]','[Not Applicable]'之外只有数值,那么mutate_at可以更改为as.numeric.它将所有非数字元素转换NA为友好警告,它不会有任何问题
df1 %>%
mutate_at(3:4, as.numeric)
# if the columns are `factor` class then wrap with `as.character`
# mutate_at(3:4, funs(as.numeric(as.character(.))))
Run Code Online (Sandbox Code Playgroud)
注意:在OP的数据集中,这些是factor类.因此,取消注释上面的代码并使用它而不是直接应用as.numeric
df1 <- structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4",
"Sample5", "Sample6"), Status = c("Alive", "Dead", "Dead", "Alive",
"Dead", "Dead"), last_contact_days_to = c("[Not Available]",
"[Not Available]", "[Not Available]", "1569", "[Not Available]",
"[Not Available]"), death_days_to = c("[Not Applicable]", "724",
"1624", "[Not Applicable]", "2532", "1271")), .Names = c("Samples",
"Status", "last_contact_days_to", "death_days_to"),
class = "data.frame", row.names = c(NA,
-6L))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
87 次 |
| 最近记录: |