Pandas:基于空行拆分数据框

Question

Pandas:基于空行拆分数据框

use*_*825 6 group-by dataframe python-2.7 pandas pandas-groupby

我有以下数据框架.

id       A        B        C   
1      34353    917998     x        
2      34973    980340     x      
3      87365    498097     x      
4      98309    486547     x      
5      87699    475132         
6      52734    4298894         
7      8749267  4918066    x    
8      89872    18103         
9      589892   4818086    y    
10     765      4063       y 
11     32369    418165     y
12     206      2918137    
13     554      3918072    
14     1029     1918051    x
15     2349243  4918064

对于每组空行,例如5,6,我想创建一个新的数据帧.需要生成多个数据帧.如下所示:

id      A        B
5      87699    475132         
6      52734    4298894

id      A        B
8      89872    18103

id      A        B
12     206      2918137    
13     554      3918072

id      A        B
15     2349243  4918064

Answer 1

piR*_*red 6

isnull = df.C.isnull()
partitions = (isnull != isnull.shift()).cumsum()

gb = df[isnull].groupby(partitions)

Run Code Online (Sandbox Code Playgroud)

在这一点上,我们已经完成了在创建的每个连续组独立的数据帧的目标NaN中df.可以通过gb.get_group()每个键的方法访问它们gb.groups

为了验证,我们将连接显示.

keys = gb.groups.keys()
dfs = pd.concat([gb.get_group(g) for g in keys], keys=keys)
dfs

Run Code Online (Sandbox Code Playgroud)

设置为 `df`

我使用了@Alberto Garcia-Raboso的读者

import io
import pandas as pd

# Create your sample dataframe
data = io.StringIO("""\
id       A        B        C   
1      34353    917998     x        
2      34973    980340     x      
3      87365    498097     x      
4      98309    486547     x      
5      87699    475132         
6      52734    4298894         
7      8749267  4918066    x    
8      89872    18103         
9      589892   4818086    y    
10     765      4063       y 
11     32369    418165     y
12     206      2918137    
13     554      3918072    
14     1029     1918051    x
15     2349243  4918064
""")
df = pd.read_csv(data, delim_whitespace=True)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，3 月前
查看次数：	1662 次
最近记录：	7 年，11 月前

Pandas:基于空行拆分数据框

设置为 df

设置为 `df`