我有一个像这样的数据帧:
id_a | date
12 | 2020-01-01
12 | 2020-01-02
13 | 2020-01-01
13 | 2020-01-03
14 | 2020-01-01
14 | 2020-01-02
14 | 2020-01-06
Run Code Online (Sandbox Code Playgroud)
我希望能够根据 id_a 来区分每个组的最大日期和最小日期以获得类似的结果
id_a | date | diff
12 | 2020-01-01 | 1
12 | 2020-01-02 | 1
13 | 2020-01-01 | 2
13 | 2020-01-03 | 2
14 | 2020-01-01 | 5
14 | 2020-01-02 | 5
14 | 2020-01-06 | 5
Run Code Online (Sandbox Code Playgroud)
我正在尝试这样做:
df['diff'] = df.groupby('id_a').apply(lambda x: max(x['date']) - min(x['date']))
Run Code Online (Sandbox Code Playgroud)
但我有点挣扎
我在正确的道路上吗?
你想要transform而不是apply. 也np.ptp可以这样做:
# convert to datetime, ignore if already is
df['date'] = pd.to_datetime(df['date'])
df['date_diff'] = df.groupby('id_a')['date'].transform(np.ptp)
Run Code Online (Sandbox Code Playgroud)
输出:
id_a date date_diff
0 12 2020-01-01 1 days
1 12 2020-01-02 1 days
2 13 2020-01-01 2 days
3 13 2020-01-03 2 days
4 14 2020-01-01 5 days
5 14 2020-01-02 5 days
6 14 2020-01-06 5 days
Run Code Online (Sandbox Code Playgroud)
更新:如果你想max从date_a与min来自date_b:
groups = df.groupby('id_a')
min_dates = groups['date_b'].transform('min')
max_dates = groups['date_a'].transform('max')
df['date_diff'] = max_dates - min_dates
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
44 次 |
| 最近记录: |