获取时间点,并针对日期时间对象制作标签以关联点周围的事物

Mic*_*orn 7 python pandas

我正在尝试使用我服药的通常时间(因此 + 4 小时以上)并在数据框中填写一个带有标签的数据框,为 2,1 或 0,用于我服用这种药物的时间,或服药后一小时为 2 小时,因为刚停药。

作为数据框的示例,我也尝试添加此列,

<bound method NDFrame.to_clipboard of                           id  sentiment  magnitude  angry  disgusted  fearful  \
created                                                                         
2020-05-21 12:00:00     23.0  -0.033333        0.5    NaN        NaN      NaN   
2020-05-21 12:15:00      NaN        NaN        NaN    NaN        NaN      NaN   
2020-05-21 12:30:00      NaN        NaN        NaN    NaN        NaN      NaN   
2020-05-21 12:45:00      NaN        NaN        NaN    NaN        NaN      NaN   
2020-05-21 13:00:00      NaN        NaN        NaN    NaN        NaN      NaN   
...                      ...        ...        ...    ...        ...      ...   
2021-04-20 00:45:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:00:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:15:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:30:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:45:00  46022.0  -1.000000        1.0    NaN        NaN      NaN   

                     happy  neutral  sad  surprised  
created                                              
2020-05-21 12:00:00    NaN      NaN  NaN        NaN  
2020-05-21 12:15:00    NaN      NaN  NaN        NaN  
2020-05-21 12:30:00    NaN      NaN  NaN        NaN  
2020-05-21 12:45:00    NaN      NaN  NaN        NaN  
2020-05-21 13:00:00    NaN      NaN  NaN        NaN  
...                    ...      ...  ...        ...  
2021-04-20 00:45:00    NaN      NaN  NaN        NaN  
2021-04-20 01:00:00    NaN      NaN  NaN        NaN  
2021-04-20 01:15:00    NaN      NaN  NaN        NaN  
2021-04-20 01:30:00    NaN      NaN  NaN        NaN  
2021-04-20 01:45:00    NaN      NaN  NaN        NaN  

[32024 rows x 10 columns]>
Run Code Online (Sandbox Code Playgroud)

以及我通常服药时的时间戳数据,

['09:00 AM', '12:00 PM', '03:00 PM']
Run Code Online (Sandbox Code Playgroud)

我将如何使用这些时间戳来获取此类列信息?

更新

因此,尝试基于这个问题,我将如何确保它只针对有可用数据的地方添加药物,并确保正确应用一小时的用药后时间!

谢谢

tdy*_*tdy 3

用于np.select()为给定条件选择适当的标签。

首先dropna(),如果之后的所有值都created为 null ( subset=df.columns[1:])。subset您可以根据您的需要进行更改(例如,subset=['id']是否应该仅因具有 null 而删除行id)。

然后datetime根据duration药物的情况生成服药期间、用药期间和服药后期间的数组。检查时间是否与(标签 1)或(标签 2)created中的任何时间匹配,否则默认为 0。activeafter

# drop rows that are empty except for column 0 (i.e., except for df.created)
df.dropna(subset=df.columns[1:], inplace=True)

# convert times to datetime
df.created = pd.to_datetime(df.created)
taken = pd.to_datetime(['09:00:00', '12:00:00', '15:00:00'])

# generate time arrays
duration = 2 # hours
active = np.array([(taken + pd.Timedelta(f'{h}H')).time for h in range(duration)]).ravel()
after = (taken + pd.Timedelta(f'{duration}H')).time

# define boolean masks by label
conditions = {
    1: df.created.dt.floor('H').dt.time.isin(active),
    2: df.created.dt.floor('H').dt.time.isin(after),
}

# create medication column with np.select()
df['medication'] = np.select(conditions.values(), conditions.keys(), default=0)
Run Code Online (Sandbox Code Playgroud)

下面是一些稍微修改过的数据的输出,可以更好地演示active//场景:afternan

               created       id  sentiment  magnitude  medication
0  2020-05-21 12:00:00     23.0  -0.033333        0.5           1
3  2020-05-21 12:45:00     39.0  -0.500000        0.5           1
4  2020-05-21 13:00:00     90.0  -0.500000        0.5           1
5  2020-05-21 13:15:00    100.0  -0.033333        0.1           1
9  2020-05-21 14:15:00   1000.0   0.033333        0.5           2
10 2020-05-21 14:30:00      3.0   0.001000        1.0           2
17 2021-04-20 01:00:00  46022.0  -1.000000        1.0           0
20 2021-04-20 01:45:00  46022.0  -1.000000        1.0           0
Run Code Online (Sandbox Code Playgroud)