如何在Python中的一个常规Pandas DataFrame中加入许多零散的时间序列

eli*_*A92 6 python time-series dataframe pandas

我必须使用从某些CSV导入的时间序列数据,如下所示:

import pandas as pd

csv_a = [["Sensor_1", '2019-05-25 10:00', 25, 60],
         ["Sensor_2", '2019-05-25 10:00', 30, 45],
         ["Sensor_1", '2019-05-25 10:05', 26, None],
         ["Sensor_2", '2019-05-25 10:05', 30, 46],
         ["Sensor_1", '2019-05-25 10:10', 27, 63],
         ["Sensor_1", '2019-05-25 10:20', 28, 62]]

df_a = pd.DataFrame(csv_a, columns=["Sensor", "Timestamp", "Temperature", "Humidity"])
df_a["Timestamp"] = (pd.to_datetime(df_a["Timestamp"]))

csv_b = [["Sensor_1", '2019-05-25 10:05', 1020],
         ["Sensor_2", '2019-05-25 10:05', 956],
         ["Sensor_3", '2019-05-25 10:05', 990],
         ["Sensor_1", '2019-05-25 10:10', 1021],
         ["Sensor_2", '2019-05-25 10:10', 957],
         ["Sensor_3", '2019-05-25 10:10', 992],
         ["Sensor_1", '2019-05-25 10:15', 1019]]

df_b = pd.DataFrame(csv_b, columns=["Sensor", "Timestamp", "Pressure"])
df_b["Timestamp"] = (pd.to_datetime(df_b["Timestamp"]))
Run Code Online (Sandbox Code Playgroud)

如您所见,我们有3个传感器。每个传感器都有自己的时间序列,可以测量温度,湿度和压力。但是,数据被分为两个CSV片段,并且可能有很多空白等。

目标是将所有数据合并到一个有序的常规数据框中,如下所示:

              Timestamp    Sensor  Temperature  Humidity  Pressure
0   2019-05-25 10:00:00  Sensor_1         25.0      60.0       NaN
1   2019-05-25 10:00:00  Sensor_2         30.0      45.0       NaN
2   2019-05-25 10:00:00  Sensor_3          NaN       NaN       NaN
3   2019-05-25 10:05:00  Sensor_1         26.0       NaN    1020.0
4   2019-05-25 10:05:00  Sensor_2         30.0      46.0     956.0
5   2019-05-25 10:05:00  Sensor_3          NaN       NaN     990.0
6   2019-05-25 10:10:00  Sensor_1         27.0      63.0    1021.0
7   2019-05-25 10:10:00  Sensor_2          NaN       NaN     957.0
8   2019-05-25 10:10:00  Sensor_3          NaN       NaN     992.0
9   2019-05-25 10:15:00  Sensor_1          NaN       NaN    1019.0
10  2019-05-25 10:15:00  Sensor_2          NaN       NaN       NaN
11  2019-05-25 10:15:00  Sensor_3          NaN       NaN       NaN
12  2019-05-25 10:20:00  Sensor_1         28.0      62.0       NaN
13  2019-05-25 10:20:00  Sensor_2          NaN       NaN       NaN
14  2019-05-25 10:20:00  Sensor_3          NaN       NaN       NaN
Run Code Online (Sandbox Code Playgroud)

这样做的逻辑是,从总体上来说,CSV中的数据始于10:00,始于10:20。并且我们为3个不同的传感器提供3个可能的变量。因此,我希望前两列(时间戳和传感器)保持规则,有序且无间隙。剩下的列(温度,湿度和压力)将在可能的情况下用CSV数据填充。

我试图以多种不同的方式使用pandas合并功能执行此操作,但是我无法获得想要的结果。我希望有经验的人可以帮助我。

jez*_*ael 4

首先通过with将两个DataFrames 连接在一起,如果可能的话,重复项使用 sum 来表示由时间戳和s 创建的唯一值。concatDataFrame.set_indexMultiIndexSensor

DataFrame.reindex然后使用byMultiIndex.from_product以及最小和最大日期 by来添加缺失的行date_range

df = (pd.concat([df_a.set_index(['Timestamp','Sensor']), 
                df_b.set_index(['Timestamp','Sensor'])], sort=True)
        .sum(level=[0,1],min_count=1))

d = df.index.get_level_values(0)
mux = pd.MultiIndex.from_product([pd.date_range(d.min(), d.max(), freq='5Min'), 
                                  df.index.get_level_values(1).unique()], names=df.index.names)
df = df.reindex(mux).reset_index()
print (df)

             Timestamp    Sensor  Humidity  Pressure  Temperature
0  2019-05-25 10:00:00  Sensor_1      60.0       NaN         25.0
1  2019-05-25 10:00:00  Sensor_2      45.0       NaN         30.0
2  2019-05-25 10:00:00  Sensor_3       NaN       NaN          NaN
3  2019-05-25 10:05:00  Sensor_1       NaN    1020.0         26.0
4  2019-05-25 10:05:00  Sensor_2      46.0     956.0         30.0
5  2019-05-25 10:05:00  Sensor_3       NaN     990.0          NaN
6  2019-05-25 10:10:00  Sensor_1      63.0    1021.0         27.0
7  2019-05-25 10:10:00  Sensor_2       NaN     957.0          NaN
8  2019-05-25 10:10:00  Sensor_3       NaN     992.0          NaN
9  2019-05-25 10:15:00  Sensor_1       NaN    1019.0          NaN
10 2019-05-25 10:15:00  Sensor_2       NaN       NaN          NaN
11 2019-05-25 10:15:00  Sensor_3       NaN       NaN          NaN
12 2019-05-25 10:20:00  Sensor_1      62.0       NaN         28.0
13 2019-05-25 10:20:00  Sensor_2       NaN       NaN          NaN
14 2019-05-25 10:20:00  Sensor_3       NaN       NaN          NaN
Run Code Online (Sandbox Code Playgroud)