用于计算日期差异的 Pandas 数据框分组函数

bAN*_*bAN 3 python pandas

我有一个像这样的数据帧:

id_a | date

12   | 2020-01-01
12   | 2020-01-02
13   | 2020-01-01
13   | 2020-01-03
14   | 2020-01-01
14   | 2020-01-02
14   | 2020-01-06
Run Code Online (Sandbox Code Playgroud)

我希望能够根据 id_a 来区分每个组的最大日期和最小日期以获得类似的结果

id_a | date       | diff

12   | 2020-01-01 | 1
12   | 2020-01-02 | 1
13   | 2020-01-01 | 2
13   | 2020-01-03 | 2
14   | 2020-01-01 | 5
14   | 2020-01-02 | 5
14   | 2020-01-06 | 5
Run Code Online (Sandbox Code Playgroud)

我正在尝试这样做:

df['diff'] = df.groupby('id_a').apply(lambda x: max(x['date']) - min(x['date']))
Run Code Online (Sandbox Code Playgroud)

但我有点挣扎

我在正确的道路上吗?

Qua*_*ang 5

你想要transform而不是apply. 也np.ptp可以这样做:

 # convert to datetime, ignore if already is
 df['date'] = pd.to_datetime(df['date'])

 df['date_diff'] = df.groupby('id_a')['date'].transform(np.ptp)
Run Code Online (Sandbox Code Playgroud)

输出:

   id_a       date date_diff
0    12 2020-01-01    1 days
1    12 2020-01-02    1 days
2    13 2020-01-01    2 days
3    13 2020-01-03    2 days
4    14 2020-01-01    5 days
5    14 2020-01-02    5 days
6    14 2020-01-06    5 days
Run Code Online (Sandbox Code Playgroud)

更新:如果你想maxdate_amin来自date_b

groups = df.groupby('id_a')
min_dates = groups['date_b'].transform('min')
max_dates = groups['date_a'].transform('max')

df['date_diff'] = max_dates - min_dates
Run Code Online (Sandbox Code Playgroud)