求 pandas 列的平均时间

K22*_*K22 5 python datetime pandas

我有一只熊猫,其格式如下:

title   |   decision   |   Time submitted
Book1   |      1       |   1486507594
Book1   |      2       |   1485450353
Run Code Online (Sandbox Code Playgroud)

我想做的是找到决策 = 1 的书籍的平均提交时间,然后找到决策 = 2 的书籍的平均提交时间。我尝试使用:

df_avg.loc[df_avg['decision'] == 2, 'submitted'].sum()
df_avg.loc[df_avg['decision'] == 1, 'submitted'].sum()
Run Code Online (Sandbox Code Playgroud)

但它有时不起作用。我什至尝试在使用日期时间将时间转换为日期和时间之前和之后执行上述操作。任何关于如何做到这一点的想法将不胜感激。

jez*_*ael 4

我认为你可以先将日期时间转换为nsunix格式,然后groupby使用aggregate mean

print (df_avg)
   title  decision  Time submitted
0  Book1         1      1486507594
1  Book1         1      1486500012
2  Book1         2      1485480353
3  Book1         2      1485450353

df_avg['Time submitted'] = pd.to_datetime(df_avg['Time submitted'], unit='s')
                             .values.astype(np.int64)

df = df_avg.groupby('decision', as_index=False)['Time submitted'].mean()
df['Time submitted'] = pd.to_datetime(df['Time submitted'], unit='ns')
print (df)
   decision      Time submitted
0         1 2017-02-07 21:43:23
1         2 2017-01-26 21:15:53
Run Code Online (Sandbox Code Playgroud)

但对于你来说,数据也适用second于多个 s unix 数据10**9

df = (df_avg['Time submitted'] * 10**9).groupby(df_avg['decision']).mean().reset_index()
df['Time submitted'] = pd.to_datetime(df['Time submitted'], unit='ns')
print (df)
   decision      Time submitted
0         1 2017-02-07 21:43:23
1         2 2017-01-26 21:15:53
Run Code Online (Sandbox Code Playgroud)