jun*_*kim 3 python split bins dataframe pandas
有没有办法拆分某些数据帧行,以便我可以使用某些 cumsum 制作一组行?在此示例中,我想拆分使 cumsum 为 20 的行
my data
timestamp counts cumsum
'2015-01-01 03:45:14' 4 4
'2015-01-01 03:45:14' 2 6
'2015-01-01 03:45:14' 1 7
'2015-01-01 03:45:15' 12 19
'2015-01-01 03:45:15' 8 27 <--split
'2015-01-01 03:45:15' 8 35
'2015-01-01 03:45:15' 2 37
'2015-01-01 03:45:16' 26 63 <--split(twice)
'2015-01-01 03:45:17' 3 66
'2015-01-01 03:45:17' 8 71
'2015-01-01 03:45:19' 11 82 <--split
'2015-01-01 03:45:20' 8 90
'2015-01-01 03:45:21' 1 91
Run Code Online (Sandbox Code Playgroud)
我希望我的数据框是这样的
我的数据
timestamp counts cumsum
'2015-01-01 03:45:14' 4 4
'2015-01-01 03:45:14' 2 6
'2015-01-01 03:45:14' 1 7
'2015-01-01 03:45:15' 12 19
'2015-01-01 03:45:15' 1 20 <--split 20
'2015-01-01 03:45:15' 7 27 <--split
'2015-01-01 03:45:15' 8 35
'2015-01-01 03:45:15' 2 37
'2015-01-01 03:45:16' 3 40 <--split 40
'2015-01-01 03:45:16' 20 60 <--split 60
'2015-01-01 03:45:16' 3 63 <--split
'2015-01-01 03:45:17' 3 66
'2015-01-01 03:45:17' 8 71
'2015-01-01 03:45:19' 9 80 <--split 80
'2015-01-01 03:45:19' 2 82 <--split
'2015-01-01 03:45:20' 8 90
'2015-01-01 03:45:21' 1 91
Run Code Online (Sandbox Code Playgroud)
您可以通过使用要添加的值 (20-40-60-80 ...) 和pd.concat原始 df创建一个数据框来实现。然后drop_duplicates在列 cumsum 上,以防您在原始数据框中已经有值 20-40-60...(感谢@jezrael 评论),sort_values此列和reset_index. 我知道您想要bfill时间戳列并diff在列 cumsum 上使用来重新计算列数。
val_split = 20
df_ = (pd.concat([df,
pd.DataFrame({'cumsum':range(val_split, df['cumsum'].max(), val_split)})])
.drop_duplicates('cumsum')
.sort_values('cumsum')
.reset_index(drop=True)
)
df_['timestamp'] = df_['timestamp'].bfill()
df_['counts'] = df_['cumsum'].diff().fillna(df_['counts'])
print (df_)
timestamp counts cumsum
0 '2015-01-01 03:45:14' 4.0 4
1 '2015-01-01 03:45:14' 2.0 6
2 '2015-01-01 03:45:14' 1.0 7
3 '2015-01-01 03:45:15' 12.0 19
4 '2015-01-01 03:45:15' 1.0 20
5 '2015-01-01 03:45:15' 7.0 27
6 '2015-01-01 03:45:15' 8.0 35
7 '2015-01-01 03:45:15' 2.0 37
8 '2015-01-01 03:45:16' 3.0 40
9 '2015-01-01 03:45:16' 20.0 60
10 '2015-01-01 03:45:16' 3.0 63
11 '2015-01-01 03:45:17' 3.0 66
12 '2015-01-01 03:45:17' 5.0 71
13 '2015-01-01 03:45:19' 9.0 80
14 '2015-01-01 03:45:19' 2.0 82
15 '2015-01-01 03:45:20' 8.0 90
16 '2015-01-01 03:45:21' 1.0 91
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
179 次 |
| 最近记录: |