当条件为真时,Pandas将数据帧分成多个

S.B*_*B.G 5 python split python-3.x pandas

我有一个数据框,如下面的df.我想为条件为真的每个数据块创建一个新的数据帧,这样它就会返回df_1,df_2 .... df_n.

|      df           |       |  df_1 |   | df_2  |
| Value | Condition |       | Value |   | Value |
|-------|-----------|       |-------|---|-------|
| 2     | True      |   |   | 2     |   | 0     |
| 5     | True      |   |   | 5     |   | 5     |
| 4     | True      |   |   | 4     |   |       |
| 4     | False     |   |   |       |   |       |
| 2     | False     |   |   |       |   |       |
| 0     | True      |   |   |       |   |       |
| 5     | True      |   |   |       |   |       |
| 7     | False     |   |   |       |   |       |
| 8     | False     |   |   |       |   |       |      
| 9     | False     |   |   |       |   |       |
Run Code Online (Sandbox Code Playgroud)

我唯一的想法是遍历数据帧,返回每个True值块的开始和结束索引,然后创建新的数据帧,循环遍历返回的索引,为每个开始/结束对返回类似的内容:

newdf = df.iloc[start:end]
Run Code Online (Sandbox Code Playgroud)

但这样做似乎效率低下.

jez*_*ael 5

DataFrame通过Seriescumsum倒置布尔列的创建者创建s 的字典,并NaN为无组添加s 为where

g = (~df['Condition']).cumsum().where(df['Condition'])
print (g)
0    0.0
1    0.0
2    0.0
3    NaN
4    NaN
5    2.0
6    2.0
7    NaN
8    NaN
9    NaN
Name: Condition, dtype: float64

#enumerate for starting groups from 1, 2, N
dfs = {i+1:v for i, (k, v) in enumerate(df[['Value']].groupby(g))}
print (dfs)
{1:    Value
0      2
1      5
2      4, 2:    Value
5      0
6      5}

print (dfs[1])
   Value
0      2
1      5
2      4

print (dfs[2])
   Value
5      0
6      5
Run Code Online (Sandbox Code Playgroud)


jpp*_*jpp 5

这是一种替代解决方案。请注意,consecutive_groups配方来自more_itertools库。

from itertools import groupby
from operator import itemgetter

def consecutive_groups(iterable, ordering=lambda x: x):
    for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
        yield map(itemgetter(1), g)

grps = consecutive_groups(df[df.Condition].index)

dfs = {i: df.iloc[list(j)] for i, j in enumerate(grps, 1)}

# {1:    Value Condition
# 0      2      True
# 1      5      True
# 2      4      True,
# 2:    Value Condition
# 5      0      True
# 6      5      True}
Run Code Online (Sandbox Code Playgroud)