将Pandas DataFrame列列表值拆分为重复的行

Ani*_*hra 5 dataframe python-3.x pandas

我有一个数据框,如下所示:

publication_title    authors                             type ...
title 1              ['author1', 'author2', 'author3']   proceedings
title 2              ['author4', 'author5']              collections
title 3              ['author6', 'author7']              books
.
.
. 
Run Code Online (Sandbox Code Playgroud)

我想要做的是获取“作者”列,并通过复制所有其他列将列表中的列表分成几行,我还想将结果存储在名为“作者”的新列中,并保留原始列。

以下内容准确描述了我想要实现的目标:

publication_title    authors                             author          type ...
title 1              ['author1', 'author2', 'author3']   author1         proceedings
title 1              ['author1', 'author2', 'author3']   author2         proceedings
title 1              ['author1', 'author2', 'author3']   author3         proceedings
title 2              ['author4', 'author5']              author4         collections
title 2              ['author4', 'author5']              author5         collections
title 3              ['author6', 'author7']              author6         books
title 3              ['author6', 'author7']              author7         books
.
.
. 
Run Code Online (Sandbox Code Playgroud)

我尝试使用pandas DataFrame explode方法实现此目的,但是我找不到将结果存储在新列中的方法。

谢谢您的帮助。

Erf*_*fan 6

既然pandas 0.25.0我们有了explode方法。首先,我们复制authors列并同时使用重命名它,assign然后我们将此列分解为行并复制其他列:

df.assign(author=df['authors']).explode('author')
Run Code Online (Sandbox Code Playgroud)

输出

  publication_title                      authors         type   author
0           title_1  [author1, author2, author3]  proceedings  author1
0           title_1  [author1, author2, author3]  proceedings  author2
0           title_1  [author1, author2, author3]  proceedings  author3
1           title_2           [author4, author5]  collections  author4
1           title_2           [author4, author5]  collections  author5
2           title_3           [author6, author7]        books  author6
2           title_3           [author6, author7]        books  author7
Run Code Online (Sandbox Code Playgroud)

如果要删除重复的索引,请使用reset_index

df.assign(author=df['authors']).explode('author').reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)

输出

  publication_title                      authors         type   author
0           title_1  [author1, author2, author3]  proceedings  author1
1           title_1  [author1, author2, author3]  proceedings  author2
2           title_1  [author1, author2, author3]  proceedings  author3
3           title_2           [author4, author5]  collections  author4
4           title_2           [author4, author5]  collections  author5
5           title_3           [author6, author7]        books  author6
6           title_3           [author6, author7]        books  author7
Run Code Online (Sandbox Code Playgroud)