And*_*ake 5 python dataframe pandas
假设我有两个数据框:
df1: df2:
+-------------------+----+ +-------------------+-----+
| Timestamp |data| | Timestamp |stuff|
+-------------------+----+ +-------------------+-----+
|2019/04/02 11:00:01| 111| |2019/04/02 11:00:14| 101|
|2019/04/02 11:00:15| 222| |2019/04/02 11:00:15| 202|
|2019/04/02 11:00:29| 333| |2019/04/02 11:00:16| 303|
|2019/04/02 11:00:30| 444| |2019/04/02 11:00:30| 404|
+-------------------+----+ |2019/04/02 11:00:31| 505|
+-------------------+-----+
Run Code Online (Sandbox Code Playgroud)
在不遍历 df2 的每一行的情况下,我试图根据时间戳连接两个数据帧。因此,对于 df2 中的每一行,它都会从 df1 中“添加”那个特定时间的数据。在这个例子中,结果数据帧将是:
Adding df1 data to df2:
+-------------------+-----+----+
| Timestamp |stuff|data|
+-------------------+-----+----+
|2019/04/02 11:00:14| 101| 111|
|2019/04/02 11:00:15| 202| 222|
|2019/04/02 11:00:16| 303| 222|
|2019/04/02 11:00:30| 404| 444|
|2019/04/02 11:00:31| 505|None|
+-------------------+-----+----+
Run Code Online (Sandbox Code Playgroud)
循环遍历 df2 的每一行然后与每个 df1 进行比较是非常低效的。还有其他方法吗?
使用merge_asof:
df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
df = pd.merge_asof(df2, df1, on='Timestamp')
print (df)
Timestamp stuff data
0 2019-04-02 11:00:14 101 111
1 2019-04-02 11:00:15 202 222
2 2019-04-02 11:00:16 303 222
3 2019-04-02 11:00:30 404 444
Run Code Online (Sandbox Code Playgroud)
还可以更改订单df1与df2并添加参数direction='forward':
df = pd.merge_asof(df1, df2, on='Timestamp', direction='forward')
print (df)
Timestamp data stuff
0 2019-04-02 11:00:01 111 101.0
1 2019-04-02 11:00:15 222 202.0
2 2019-04-02 11:00:29 333 404.0
3 2019-04-02 11:00:30 444 404.0
4 2019-04-02 11:00:31 505 NaN
Run Code Online (Sandbox Code Playgroud)
#default direction='backward'
df = pd.merge_asof(df1, df2, on='Timestamp')
print (df)
Timestamp data stuff
0 2019-04-02 11:00:01 111 NaN
1 2019-04-02 11:00:15 222 202.0
2 2019-04-02 11:00:29 333 303.0
3 2019-04-02 11:00:30 444 404.0
4 2019-04-02 11:00:31 505 404.0
Run Code Online (Sandbox Code Playgroud)