Ani*_*hra 5 dataframe python-3.x pandas
我有一个数据框,如下所示:
publication_title authors type ...
title 1 ['author1', 'author2', 'author3'] proceedings
title 2 ['author4', 'author5'] collections
title 3 ['author6', 'author7'] books
.
.
.
Run Code Online (Sandbox Code Playgroud)
我想要做的是获取“作者”列,并通过复制所有其他列将列表中的列表分成几行,我还想将结果存储在名为“作者”的新列中,并保留原始列。
以下内容准确描述了我想要实现的目标:
publication_title authors author type ...
title 1 ['author1', 'author2', 'author3'] author1 proceedings
title 1 ['author1', 'author2', 'author3'] author2 proceedings
title 1 ['author1', 'author2', 'author3'] author3 proceedings
title 2 ['author4', 'author5'] author4 collections
title 2 ['author4', 'author5'] author5 collections
title 3 ['author6', 'author7'] author6 books
title 3 ['author6', 'author7'] author7 books
.
.
.
Run Code Online (Sandbox Code Playgroud)
我尝试使用pandas DataFrame explode方法实现此目的,但是我找不到将结果存储在新列中的方法。
谢谢您的帮助。
既然pandas 0.25.0我们有了explode方法。首先,我们复制authors列并同时使用重命名它,assign然后我们将此列分解为行并复制其他列:
df.assign(author=df['authors']).explode('author')
Run Code Online (Sandbox Code Playgroud)
输出
publication_title authors type author
0 title_1 [author1, author2, author3] proceedings author1
0 title_1 [author1, author2, author3] proceedings author2
0 title_1 [author1, author2, author3] proceedings author3
1 title_2 [author4, author5] collections author4
1 title_2 [author4, author5] collections author5
2 title_3 [author6, author7] books author6
2 title_3 [author6, author7] books author7
Run Code Online (Sandbox Code Playgroud)
如果要删除重复的索引,请使用reset_index:
df.assign(author=df['authors']).explode('author').reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)
输出
publication_title authors type author
0 title_1 [author1, author2, author3] proceedings author1
1 title_1 [author1, author2, author3] proceedings author2
2 title_1 [author1, author2, author3] proceedings author3
3 title_2 [author4, author5] collections author4
4 title_2 [author4, author5] collections author5
5 title_3 [author6, author7] books author6
6 title_3 [author6, author7] books author7
Run Code Online (Sandbox Code Playgroud)