Pandas,通过单值增加列值来分割数据帧

9bl*_*lue 6 python numpy dataframe pandas

我有一个巨大的数据框,其中包含一个名为time的日期时间类型列,另一个名为dist的浮点类型列,数据框基于时间排序,而dist已经排序.我想基于dist的单调增加将数据帧分成几个数据帧.

分裂

   dt                    dist
0  20160811 11:10        1.0
1  20160811 11:15        1.4
2  20160811 12:15        1.8
3  20160811 12:32        0.6
4  20160811 12:34        0.8
5  20160811 14:38        0.2
Run Code Online (Sandbox Code Playgroud)

   dt                    dist
0  20160811 11:10        1.0
1  20160811 11:15        1.4
2  20160811 12:15        1.8

   dt                    dist
0  20160811 12:32        0.6
1  20160811 12:34        0.8

   dt                    dist
0  20160811 14:38        0.2
Run Code Online (Sandbox Code Playgroud)

Psi*_*dom 8

您可以计算dist列的差异向量,然后cumsum()对条件执行a diff < 0(每当dist从前一个值开始减少时,这将创建一个新的id )

df['id'] = (df.dist.diff() < 0).cumsum()

print(df)

#               dt  dist  id
#0  20160811 11:10   1.0   0
#1  20160811 11:15   1.4   0
#2  20160811 12:15   1.8   0
#3  20160811 12:32   0.6   1
#4  20160811 12:34   0.8   1
#5  20160811 14:38   0.2   2

for _, g in df.groupby((df.dist.diff() < 0).cumsum()):
    print(g)

#               dt  dist
#0  20160811 11:10   1.0
#1  20160811 11:15   1.4
#2  20160811 12:15   1.8
#               dt  dist
#3  20160811 12:32   0.6
#4  20160811 12:34   0.8
#               dt  dist
#5  20160811 14:38   0.2
Run Code Online (Sandbox Code Playgroud)