Tom*_* Yu 2 python nested-lists dataframe pandas
我有一个像这样的项目列表:
lgenre[8:15]
[['Action'],
['Action', 'Adventure', 'Thriller'],
['Comedy', 'Drama', 'Romance'],
['Comedy', 'Horror'],
['Animation', "Children's"],
['Drama'],
['Action', 'Adventure', 'Romance']]
Run Code Online (Sandbox Code Playgroud)
我想要的是:
id Action Adventure Thriller Comedy Drama Romance Horror Animation Children's
0 0 1 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0
2 2 0 0 0 1 1 1 0 0 0
3 3 0 0 0 1 0 0 1 0 0
4 4 0 0 0 0 0 0 0 1 1
5 5 0 0 0 0 1 0 0 0 0
6 6 1 1 0 0 0 1 0 0 0
Run Code Online (Sandbox Code Playgroud)
我尝试的是编写一个如下所示的双循环:
lgenre[8:15]
[['Action'],
['Action', 'Adventure', 'Thriller'],
['Comedy', 'Drama', 'Romance'],
['Comedy', 'Horror'],
['Animation', "Children's"],
['Drama'],
['Action', 'Adventure', 'Romance']]
Run Code Online (Sandbox Code Playgroud)
虽然可以编译,但是实现起来太慢了。有没有什么有效的方法来做这种事情?任何更好的算法或内置方法?
从嵌套列表构建数据框,并使用pd.get_dummies:
df = pd.get_dummies(pd.DataFrame(l))
df.columns = df.columns.str.split("_").str[-1]
Action Animation Comedy Drama Adventure Children's Drama Horror \
0 1 0 0 0 0 0 0 0
1 1 0 0 0 1 0 0 0
2 0 0 1 0 0 0 1 0
3 0 0 1 0 0 0 0 1
4 0 1 0 0 0 1 0 0
5 0 0 0 1 0 0 0 0
6 1 0 0 0 1 0 0 0
Romance Thriller
0 0 0
1 0 1
2 1 0
3 0 0
4 0 0
5 0 0
6 1 0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
565 次 |
| 最近记录: |