我有一个带有分类变量的数据框,其中包含可变长度的字符串列表(这很重要,因为否则此问题将与此或此重复),例如:
df <- data.frame(x = 1:5)
df$y <- list("A", c("A", "B"), "C", c("B", "D", "C"), "E")
df
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)x y 1 1 A 2 2 A, B 3 3 C 4 4 B, D, C 5 5 E
并且所需的形式是在任何地方看到的每个唯一字符串的虚拟变量df$y,即:
data.frame(x = 1:5, A = c(1,1,0,0,0), B = c(0,1,0,1,0), C = c(0,0,1,1,0), D = c(0,0,0,1,0), E = c(0,0,0,0,1))
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)x A B C D E 1 1 1 0 0 0 0 2 2 1 …
什么是一种有效的方法(欢迎包括非基础包的任何解决方案)将虚拟变量折叠回一个因子.
race.White race.Hispanic race.Black race.Asian
1 1 0 0 0
2 0 0 0 1
3 1 0 0 0
4 0 0 1 0
5 0 0 0 1
6 0 1 0 0
7 1 0 0 0
8 1 0 0 0
9 1 0 0 0
10 0 0 1 0
Run Code Online (Sandbox Code Playgroud)
期望的输出:
race
1 White
2 Asian
3 White
4 Black
5 Asian
6 Hispanic
7 White
8 White
9 White
10 Black
Run Code Online (Sandbox Code Playgroud)
数据:
dat <- …Run Code Online (Sandbox Code Playgroud) 假设我有一个看起来像这样的数据框:
df1=structure(list(Name = structure(1:6, .Label = c("N1", "N2", "N3",
"N4", "N5", "N6", "N7"), class = "factor"), sector = structure(c(4L,
4L, 4L, 3L, 3L, 2L), .Label = c("other stuff", "Private for-profit, 4-year or above",
"Private not-for-profit, 4-year or above", "Public, 4-year or above"
), class = "factor"), flagship = c(1, 0, 0, 0, 0, 0)), .Names = c("Name",
"sector", "flagship"), row.names = c(NA, 6L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我想创建一个新的因子变量“ Sector”。我可以用很多行代码来做很长的路,但是我敢肯定有一种更有效的方法。
现在这就是我正在做的:
df1$PublicFlag=0
df1$PublicFlag[df1$sector=="Public, 4-year or above" & df1$flagship==1]=1
df1$Public=0
df1$Public[df1$sector=="Public, 4-year or …Run Code Online (Sandbox Code Playgroud)