ald*_*ado 3 r transformation binary-data one-hot-encoding
我有一个中等大的数据帧,我想要将一个列的类别转换为二进制列,每个类别一个.
同时,我希望将其余列保留在数据框中.
实现这一目标最简单的方法是什么?
这是我想要做的一个例子:
d<-data.frame(ID=c("a","b","c","d"), Gender=c("male", "male", "female","female"), Age =c(23,45,18,11))
ID Gender Age
1 a male 23
2 b male 45
3 c female 18
4 d female 11
Run Code Online (Sandbox Code Playgroud)
之后应该看作d2,这样ID和Age列仍然存在并且不受影响:
d2<-data.frame(ID=c("a","b","c","d"), Gender.male=c(1, 1, 0, 0), Gender.female=c(0,0,1,1), Age =c(23,45,18,11))
ID Gender.male Gender.female Age
1 a 1 0 23
2 b 1 0 45
3 c 0 1 18
4 d 0 1 11
Run Code Online (Sandbox Code Playgroud)
我们可以用 spread
library(tidyvesre)
d %>%
mutate(n = 1) %>%
spread(Gender, n, fill = 0)
Run Code Online (Sandbox Code Playgroud)
或者使用dcast从reshape2
library(reshape2)
dcast(d, ID + Age ~ Gender, length)
# ID Age female male
#1 a 23 0 1
#2 b 45 0 1
#3 c 18 1 0
#4 d 11 1 0
Run Code Online (Sandbox Code Playgroud)