我有一个pandas DataFrame,timedeltas作为单独列中这些增量的累积和,用毫秒表示.下面提供了一个示例:
Transaction_ID Time TimeDelta CumSum[ms]
1 00:00:04.500 00:00:00.000 000
2 00:00:04.600 00:00:00.100 100
3 00:00:04.762 00:00:00.162 262
4 00:00:05.543 00:00:00.781 1043
5 00:00:09.567 00:00:04.024 5067
6 00:00:10.654 00:00:01.087 6154
7 00:00:14.300 00:00:03.646 9800
8 00:00:14.532 00:00:00.232 10032
9 00:00:16.500 00:00:01.968 12000
10 00:00:17.543 00:00:01.043 13043
Run Code Online (Sandbox Code Playgroud)
我希望能够为CumSum [ms]提供最大值,之后累积和将从0重新开始.例如,如果在上面的示例中最大值为3000,则结果将如下所示:
Transaction_ID Time TimeDelta CumSum[ms]
1 00:00:04.500 00:00:00.000 000
2 00:00:04.600 00:00:00.100 100
3 00:00:04.762 00:00:00.162 262
4 00:00:05.543 00:00:00.781 1043
5 00:00:09.567 00:00:04.024 0
6 00:00:10.654 00:00:01.087 1087
7 00:00:14.300 00:00:03.646 0 …Run Code Online (Sandbox Code Playgroud) In [46]: d = np.random.randn(10, 1) * 2
In [47]: df = pd.DataFrame(d.astype(int), columns=['data'])
Run Code Online (Sandbox Code Playgroud)
我正在尝试创建一个 cumsum 列,它应该在数据列中的符号更改后重置,如下所示
data custom_cumsum
0 -2 -2
1 -1 -3
2 1 1
3 -3 -3
4 -1 -4
5 2 2
6 0 2
7 3 5
8 -1 -1
9 -2 -3
Run Code Online (Sandbox Code Playgroud)
我能够通过 实现这一点df.iterrows()。我试图避免迭代并通过向量运算来实现。当存在 NaN 时,有几个关于重置 cumsum的问题。我无法通过这些解决方案实现这个目标。