gae*_*cia 9 r matrix dataframe r-factor
我有一个包含5个不同列的数据框:
Test1 Test2 Test3 Test4 Test5
Sample1 PASS PASS FAIL WARN WARN
Sample2 PASS PASS FAIL PASS WARN
Sample3 PASS FAIL FAIL PASS WARN
Sample4 PASS FAIL FAIL PASS WARN
Sample5 PASS WARN FAIL WARN WARN
Run Code Online (Sandbox Code Playgroud)
在每列中,为每个级别分配不同的因子.在第1栏中,"PASS"为1.在第2栏中,"PASS"为2,"FAIL为1.在第3栏中,"FAIL"为1.在第4栏中,"PASS"为1,"WARN"为2在第5栏中,"警告"是1.
按字母顺序执行它我需要"PASS"在所有列中为1,"WARN"在所有列中为2,并且在所有列中为"FAIL"3,以便我可以转换为矩阵并将其转换为热图.
目前,它根据特定列中显示的级别和字母顺序将因子分配给级别.
如何在整个数据框中保持不变?
您可以通过循环将数据集"df"的级别更改为相同的顺序(lapply
)将factor
然后使用指定的值再次转换为levels
相应的列,并将其分配回相应的列.
lvls <- c('PASS', 'WARN', 'FAIL')
df[] <- lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of 5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2
Run Code Online (Sandbox Code Playgroud)
如果您选择使用 data.table
library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]
Run Code Online (Sandbox Code Playgroud)
setDT
将"data.frame"转换为"data.table",将:=
数据集的列名称()分配给重新转换的因子列(lapply(..)
)..SD
表示"数据表的子集".
df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L),
.Label = "PASS", class = "factor"),
Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL",
"PASS", "WARN"), class = "factor"), Test3 = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 =
structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"),
class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label =
"WARN", class = "factor")), .Names = c("Test1",
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1",
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)