我有一个从 API 获取的数据,如下所示(当然是 JSON 形式):
0,1500843600,8872
1,1500807600,18890
2,1500811200,2902
.
.
.
Run Code Online (Sandbox Code Playgroud)
其中第二列是以刻度为单位的日期/时间,第三列是某个值。我基本上拥有几个月内每天每小时的数据。现在,我想要实现的是我想获得每周第三列的最小值。我有一个代码段,如下所示:
from bs4 import BeautifulSoup
import datetime
import json
import pandas
# Partially removed for brevity.
# dic holds now the data that I get from the API.
dic = json.loads(soup.prettify())
df = pandas.DataFrame(columns=['Timestamp', 'Value'])
for i in range(len(dic)):
df.loc[i] = [datetime.datetime.fromtimestamp(int(dic[i][1])).strftime('%d-%m-%Y %H:%M:%S'), dic[i][2]]
df.sort_values(by=['Timestamp'])
df['Timestamp'] = pandas.to_datetime(df['Timestamp'])
df.set_index(df['Timestamp'], inplace=True)
print(df['Value'].resample('W').min())
Run Code Online (Sandbox Code Playgroud)
虽然,这并没有给我完全正确的结果,但也有一些结果是NaN。此外,我还想获取时间戳和最小值,这样我就知道最小值发生在一周中的哪个日期/时间。有什么想法可以实现我想要的吗?
您可以使用 pandas grouper和groupby函数
>>> data = [[0,1500843600,8872],[1,1500807600,18890],[2,1500811200,2902]]
>>> pd.DataFrame(data=data, columns=['id', 'Timestamp', 'Value'])
id Timestamp Value
0 0 1500843600 8872
1 1 1500807600 18890
2 2 1500811200 2902
>>> df = pd.DataFrame(data=data, columns=['id', 'Timestamp', 'Value'])
>>> pd.to_datetime(df.Timestamp)
0 1970-01-01 00:00:01.500843600
1 1970-01-01 00:00:01.500807600
2 1970-01-01 00:00:01.500811200
Name: Timestamp, dtype: datetime64[ns]
>>> df.Timestamp = pd.to_datetime(df.Timestamp)
>>> df
id Timestamp Value
0 0 1970-01-01 00:00:01.500843600 8872
1 1 1970-01-01 00:00:01.500807600 18890
2 2 1970-01-01 00:00:01.500811200 2902
>>> df.groupby([pd.Grouper(key='Timestamp', freq='W-MON')])['Value'].min()
Timestamp
1970-01-05 2902
Name: Value, dtype: int64
Run Code Online (Sandbox Code Playgroud)
您可能还想查看锚定偏移量,因为您可以选择从不同日期开始的 W 频率
-- 编辑 -- 正如 MaxU 下面建议的那样,如果您想将时间戳保留在秒内,则使用df.Timestamp = pd.to_datetime(df.Timestamp, unit='s')