R-将列的列转换为不同的列,使用它们的值作为名称(虚拟)

Gab*_*yLP 12 split r dataframe

我有一个包含电影数据的表格,在最后一列中,它有电影所属的类别.

  movieId                              title                   category
       1                   Toy Story (1995)  Animation|Children|Comedy
       2                     Jumanji (1995) Adventure|Children|Fantasy
       3            Grumpier Old Men (1995)             Comedy|Romance
       4           Waiting to Exhale (1995)               Comedy|Drama
       5 Father of the Bride Part II (1995)                     Comedy
       6                        Heat (1995)      Action|Crime|Thriller
Run Code Online (Sandbox Code Playgroud)

我想为每个类别创建一个列,如果它写在该电影的列表中则放1,否则放0.就像是:

movieId title   animation   comedy  drama
1        xx        1           0      1
2        xy        1           0      0
3        yy        1           1      0
Run Code Online (Sandbox Code Playgroud)

到目前为止,我只将字符串转换为列表:

f<-function(x) {strsplit(x, split='|', fixed=TRUE)}
movies2$m<-lapply(movies2$category, f)
Run Code Online (Sandbox Code Playgroud)

但我不知道如何做其余的事情.

我在考虑使用Python词典.但我不知道如何在R中这样做.

数据

df1 <- read.table(header = TRUE, stringsAsFactors = FALSE,
                  text = " movieId                              title                   category
                  1                   'Toy Story (1995)'  Animation|Children|Comedy
                  2                     'Jumanji (1995)' Adventure|Children|Fantasy
                  3            'Grumpier Old Men (1995)'             Comedy|Romance
                  4           'Waiting to Exhale (1995)'               Comedy|Drama
                  5 'Father of the Bride Part II (1995)'                     Comedy
                  6                        'Heat (1995)'      Action|Crime|Thriller")
Run Code Online (Sandbox Code Playgroud)

akr*_*run 5

我们可以使用mtabulateqdapTools分割后

library(qdapTools)
cbind(df1[-3],mtabulate(strsplit(df1$category, "[|]")))
# movieId                              title Action Adventure Animation Children Comedy Crime Drama Fantasy Romance Thriller
#1       1                   Toy Story (1995)      0         0         1        1      1     0     0       0       0        0
#2       2                     Jumanji (1995)      0         1         0        1      0     0     0       1       0        0
#3       3            Grumpier Old Men (1995)      0         0         0        0      1     0     0       0       1        0
#4       4           Waiting to Exhale (1995)      0         0         0        0      1     0     1       0       0        0
#5       5 Father of the Bride Part II (1995)      0         0         0        0      1     0     0       0       0        0
#6       6                        Heat (1995)      1         0         0        0      0     1     0       0       0        1
Run Code Online (Sandbox Code Playgroud)

或使用 base R

cbind(df1[-3], as.data.frame.matrix(table(stack(setNames(strsplit(df1$category,
                           "[|]"), df1$movieId))[2:1])))
Run Code Online (Sandbox Code Playgroud)