在时间戳上加入两个不同的数据帧

And*_*ake 5 python dataframe pandas

假设我有两个数据框:

df1:                          df2:
+-------------------+----+    +-------------------+-----+
|  Timestamp        |data|    |  Timestamp        |stuff|
+-------------------+----+    +-------------------+-----+
|2019/04/02 11:00:01| 111|    |2019/04/02 11:00:14|  101|
|2019/04/02 11:00:15| 222|    |2019/04/02 11:00:15|  202|
|2019/04/02 11:00:29| 333|    |2019/04/02 11:00:16|  303|
|2019/04/02 11:00:30| 444|    |2019/04/02 11:00:30|  404|
+-------------------+----+    |2019/04/02 11:00:31|  505|
                              +-------------------+-----+
Run Code Online (Sandbox Code Playgroud)

在不遍历 df2 的每一行的情况下,我试图根据时间戳连接两个数据帧。因此,对于 df2 中的每一行,它都会从 df1 中“添加”那个特定时间的数据。在这个例子中,结果数据帧将是:

Adding df1 data to df2:
+-------------------+-----+----+
|  Timestamp        |stuff|data|
+-------------------+-----+----+
|2019/04/02 11:00:14|  101| 111|
|2019/04/02 11:00:15|  202| 222|
|2019/04/02 11:00:16|  303| 222|
|2019/04/02 11:00:30|  404| 444|
|2019/04/02 11:00:31|  505|None|
+-------------------+-----+----+
Run Code Online (Sandbox Code Playgroud)

循环遍历 df2 的每一行然后与每个 df1 进行比较是非常低效的。还有其他方法吗?

jez*_*ael 8

使用merge_asof

df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])

df = pd.merge_asof(df2, df1, on='Timestamp')
print (df)
            Timestamp  stuff  data
0 2019-04-02 11:00:14    101   111
1 2019-04-02 11:00:15    202   222
2 2019-04-02 11:00:16    303   222
3 2019-04-02 11:00:30    404   444
Run Code Online (Sandbox Code Playgroud)

还可以更改订单df1df2并添加参数direction='forward'

df = pd.merge_asof(df1, df2, on='Timestamp', direction='forward')
print (df)
            Timestamp  data  stuff
0 2019-04-02 11:00:01   111  101.0
1 2019-04-02 11:00:15   222  202.0
2 2019-04-02 11:00:29   333  404.0
3 2019-04-02 11:00:30   444  404.0
4 2019-04-02 11:00:31   505    NaN
Run Code Online (Sandbox Code Playgroud)
#default direction='backward'
df = pd.merge_asof(df1, df2, on='Timestamp')
print (df)
            Timestamp  data  stuff
0 2019-04-02 11:00:01   111    NaN
1 2019-04-02 11:00:15   222  202.0
2 2019-04-02 11:00:29   333  303.0
3 2019-04-02 11:00:30   444  404.0
4 2019-04-02 11:00:31   505  404.0
Run Code Online (Sandbox Code Playgroud)