通过按换行符拆分列，将 pandas 数据框分解为单独的行

Question

通过按换行符拆分列，将 pandas 数据框分解为单独的行

我有 pandas 数据框，其中一列包含文本段落。我想通过将文本段落拆分为换行符来将数据框分解为单独的行。文本段落可能包含多个换行符或回车符，如下所示。为了简化，我创建了以下示例

    A                                             B  index_col
0  A0                                            B0          0
1  A1  split this\n\n into \r\n separate \n rows \n          1
2  A2                                            B2          2
3  A3                                            B3          3

Run Code Online (Sandbox Code Playgroud)

我尝试将系列拆分为多个值，并使用堆栈方法将它们组合成单列。我无法获得所需的输出。任何建议将不胜感激！

b = pd.DataFrame(df['B'].str.split('\n').tolist(), index=df['index_col']).stack()
    b = b.reset_index()[[0, 'index_col']]
    b.columns = ['B', 'index_col']

Run Code Online (Sandbox Code Playgroud)

Current output:
            B  index_col
0          B0          0
1  split this          1
2                      1
3     into \r          1
4   separate           1
5       rows           1
6                      1
7          B2          2
8          B3          3

Desired output:
            B  index_col
0          B0          0
1  split this          1
2     into             1
3   separate           1
4       rows           1
5          B2          2
6          B3          3

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 8

样本：

df = pd.DataFrame({'A':['A0','A1'],
                    'B':['B0', 'split this\n\n into \r\n separate \n rows \n'],
                   'index_col':[0,1]})
print (df)
    A                                             B  index_col
0  A0                                            B0          0
1  A1  split this\n\n into \r\n separate \n rows \n          1

Run Code Online (Sandbox Code Playgroud)

您的解决方案应使用进行更改DataFrame.set_index，Series.str.replace添加expand=True到Series.str.splitfor并最后从byDataFrame中取出空字符串：BDataFrame.query

df1 = (df.set_index('index_col')['B']
         .str.replace('\r', ' ')
         .str.split('\n', expand=True)
         .stack()
         .rename('B')
         .reset_index(level=1, drop=True)
         .reset_index()[['B', 'index_col']]
         .query("B != ''"))
print (df1)
            B  index_col
0          B0          0
1  split this          1
3      into            1
4   separate           1
5       rows           1

Run Code Online (Sandbox Code Playgroud)

对于 pandas 0.25+ 可以使用DataFrame.explode：

df['B'] = df['B'].str.replace('\r', ' ').str.split('\n')
df1 = df[['B', 'index_col']].explode('B').query("B != ''")
print (df1)
            B  index_col
0          B0          0
1  split this          1
1      into            1
1   separate           1
1       rows           1

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，5 月前
查看次数：	3321 次
最近记录：	6 年，5 月前