我试图将我的数据框中的单个"字符"变量拆分为多个"因子"变量.
> sampledf=data.frame(vin=c('v1','v2','v3'),features=c('f1:f2:f3','f2:f4:f5','f1:f4:f5'))
> sampledf
vin features
1 v1 f1:f2:f3
2 v2 f2:f4:f5
3 v3 f1:f4:f5
> desireddf=data.frame(vin=c('v1','v2','v3'),f1=c(1,0,1),f2=c(1,1,0),f3=c(1,0,0),f4=c(0,1,1),f5=c(0,1,1))
> desireddf
vin f1 f2 f3 f4 f5
1 v1 1 1 1 0 0
2 v2 0 1 0 1 1
3 v3 1 0 0 1 1
Run Code Online (Sandbox Code Playgroud)
我已经尝试过strsplit()分开"功能"列
strsplit(as.character(df$features), ";")
Run Code Online (Sandbox Code Playgroud)
但没有运气因素.
例如,我有这样的数据:
data <- data.frame(person=paste0("person_", 1:5),
keyword=sapply(1:5, function(x) paste0(sample(letters, sample(1:5, 1)), collapse = ","))
)
Run Code Online (Sandbox Code Playgroud)
> data
person keyword
1 person_1 k,f,p,w
2 person_2 y,j
3 person_3 y,r
4 person_4 g,w
5 person_5 u,x,c,n
Run Code Online (Sandbox Code Playgroud)
我想将关键字拆分为多列,并最终将它们转换为二进制数据,如下所示:
person k f p w y j r g w u x c n
1 person_1 1 1 1 1 0 0 0 0 0 0 0 0 0
2 person_2 0 0 0 0 1 1 0 0 0 0 0 0 0
3 person_3 …Run Code Online (Sandbox Code Playgroud)