Gab*_*yLP 12 split r dataframe
我有一个包含电影数据的表格,在最后一列中,它有电影所属的类别.
movieId title category
1 Toy Story (1995) Animation|Children|Comedy
2 Jumanji (1995) Adventure|Children|Fantasy
3 Grumpier Old Men (1995) Comedy|Romance
4 Waiting to Exhale (1995) Comedy|Drama
5 Father of the Bride Part II (1995) Comedy
6 Heat (1995) Action|Crime|Thriller
Run Code Online (Sandbox Code Playgroud)
我想为每个类别创建一个列,如果它写在该电影的列表中则放1,否则放0.就像是:
movieId title animation comedy drama
1 xx 1 0 1
2 xy 1 0 0
3 yy 1 1 0
Run Code Online (Sandbox Code Playgroud)
到目前为止,我只将字符串转换为列表:
f<-function(x) {strsplit(x, split='|', fixed=TRUE)}
movies2$m<-lapply(movies2$category, f)
Run Code Online (Sandbox Code Playgroud)
但我不知道如何做其余的事情.
我在考虑使用Python词典.但我不知道如何在R中这样做.
数据
df1 <- read.table(header = TRUE, stringsAsFactors = FALSE,
text = " movieId title category
1 'Toy Story (1995)' Animation|Children|Comedy
2 'Jumanji (1995)' Adventure|Children|Fantasy
3 'Grumpier Old Men (1995)' Comedy|Romance
4 'Waiting to Exhale (1995)' Comedy|Drama
5 'Father of the Bride Part II (1995)' Comedy
6 'Heat (1995)' Action|Crime|Thriller")
Run Code Online (Sandbox Code Playgroud)
我们可以使用mtabulate从qdapTools分割后
library(qdapTools)
cbind(df1[-3],mtabulate(strsplit(df1$category, "[|]")))
# movieId title Action Adventure Animation Children Comedy Crime Drama Fantasy Romance Thriller
#1 1 Toy Story (1995) 0 0 1 1 1 0 0 0 0 0
#2 2 Jumanji (1995) 0 1 0 1 0 0 0 1 0 0
#3 3 Grumpier Old Men (1995) 0 0 0 0 1 0 0 0 1 0
#4 4 Waiting to Exhale (1995) 0 0 0 0 1 0 1 0 0 0
#5 5 Father of the Bride Part II (1995) 0 0 0 0 1 0 0 0 0 0
#6 6 Heat (1995) 1 0 0 0 0 1 0 0 0 1
Run Code Online (Sandbox Code Playgroud)
或使用 base R
cbind(df1[-3], as.data.frame.matrix(table(stack(setNames(strsplit(df1$category,
"[|]"), df1$movieId))[2:1])))
Run Code Online (Sandbox Code Playgroud)