如何求每一行到最近满足条件的行的距离?

use*_*946 5 python dataframe pandas

import datetime
import pandas as pd
pd.DataFrame({'date': {0: datetime.date(2020, 8, 15),
  1: datetime.date(2020, 8, 16),
  2: datetime.date(2020, 8, 16),
  3: datetime.date(2020, 8, 17),
  4: datetime.date(2020, 8, 17),
  5: datetime.date(2020, 8, 18),
  6: datetime.date(2020, 8, 19),
  7: datetime.date(2020, 8, 19)},
 'sign_change': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 0, 6: 1, 7: 1},
 'distance (desired_output)': {0: 2, 1: 1, 2: 1, 3: 0, 4: 0, 5: 1, 6: 0, 7: 0}})


      date      sign_change         distance (desired_output)
0  2020-08-15            0                          2
1  2020-08-16            0                          1
2  2020-08-16            0                          1
3  2020-08-17            1                          0
4  2020-08-17            1                          0
5  2020-08-18            0                          1
6  2020-08-19            1                          0
7  2020-08-19            1                          0
Run Code Online (Sandbox Code Playgroud)

对于每一行,我想找到到最近的行的距离(以天为单位),其中sign_change == 1。我已在上面的数据框中手动输入了所需的输出。

Qua*_*ang 4

我们来尝试一下广播:

s = df.sign_change!=1
offset = (np.abs(df.loc[s,'date'].values[None,:] - df.loc[~s,['date']].values).min(0)
            /pd.to_timedelta('1D')
         )

df['distance'] = 0
df.loc[s,'distance'] = offset
Run Code Online (Sandbox Code Playgroud)

输出:

         date  sign_change  distance (desired_output)  distance
0  2020-08-15            0                          2       2.0
1  2020-08-16            0                          1       1.0
2  2020-08-16            0                          1       1.0
3  2020-08-17            1                          0       0.0
4  2020-08-17            1                          0       0.0
5  2020-08-18            0                          1       1.0
6  2020-08-19            1                          0       0.0
7  2020-08-19            1                          0       0.0
Run Code Online (Sandbox Code Playgroud)