che*_*ens 3 python data-manipulation pandas data-cleaning data-science
我有以下数据框:
id day total_amount
1 2015-07-09 1000
1 2015-10-22 100
1 2015-11-12 200
1 2015-11-27 2392
1 2015-12-16 123
7 2015-07-09 200
7 2015-07-09 1000
7 2015-08-27 100018
7 2015-11-25 1000
8 2015-08-27 1000
8 2015-12-07 10000
8 2016-01-18 796
8 2016-03-31 10000
15 2015-09-10 1500
15 2015-09-30 1000
Run Code Online (Sandbox Code Playgroud)
如果它们具有相同的 id 直到到达该 id 的最后一行,我需要在 day 列中减去每两个连续的时间,然后这次开始减去 day 列中的时间以获得新的 id,输出中类似于以下几行:
1 2015-08-09 1000 2015-11-22 - 2015-08-09
1 2015-11-22 100 2015-12-12 - 2015-11-22
1 2015-12-12 200 2015-12-16 - 2015-12-12
1 2015-12-16 2392 2015-12-27 - 2015-12-27
1 2015-12-27 123 NA
7 2015-08-09 200 2015-09-09 - 2015-08-09
7 2015-09-09 1000 2015-09-27 - 2015-09-09
7 2015-09-27 100018 2015-12-25 - 2015-09-27
7 2015-12-25 1000 NA
8 2015-08-27 1000 2015-12-07 - 2015-08-27
8 2015-12-07 10000 2016-02-18 - 2015-12-07
8 2016-02-18 796 2016-04-31- 2016-02-18
8 2016-04-31 10000 NA
15 2015-10-10 1500 2015-10-30 - 2015-10-10
15 2015-10-30 1000 NA
Run Code Online (Sandbox Code Playgroud)
您可以使用DataFrameGroupBy.diff:
df['dif'] = df.groupby('id')['day'].diff(-1) * (-1)
print (df)
id day total_amount dif
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
Run Code Online (Sandbox Code Playgroud)
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x)
print (df)
id day total_amount diff
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
Run Code Online (Sandbox Code Playgroud)
通过评论编辑:
如果您需要hoursas差异int,请转换timedelta为hour:
df['diff'] = df.groupby('id')['day'].diff(-1) * (-1) / np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
Run Code Online (Sandbox Code Playgroud)
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x) /
np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5358 次 |
| 最近记录: |