RSH*_*HAP 6 python datetime resampling pandas
我有一个包含"Date"和"Num"列的数据框.
dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]
df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)
Run Code Online (Sandbox Code Playgroud)
我需要对数据帧进行分区,以便我可以分别隔离每周.最终目标是查看每周的MAX'Nums'值,并将其与下周的LAST值进行比较,以了解百分比变化的大小.例如:
week1 = df[0:5]
week2 = df[5:9]
week3 = df[9:12]
In [156]: w1max = week1.Nums.max()
Out[156]: 97
In [157]: w2Last = week2.iloc[-1].Nums
Out[157]: 76
pctChange = (w2Last-w1max)/float(w1max)
In [166]: pctChange
Out[166]: -0.21649484536082475
Run Code Online (Sandbox Code Playgroud)
问题是几天都缺少了几天(例如,周二缺少星期一,星期五缺少第3周).那么如何将它们分开呢?
最接近的似乎是使用df.resample()
但我不知道如何进行比较我正在尝试使用它.
import numpy as np
import pandas as pd
np.random.seed(2016)
dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]
df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)
df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])
df2['previous_max'] = df2['max'].shift(1)
df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']
print(df2.head())
Run Code Online (Sandbox Code Playgroud)
产量
max last previous_max change
Dates
2001-01-07 83 39 NaN NaN
2001-01-14 75 75 83.0 -0.096386
2001-01-21 97 18 75.0 -0.760000
2001-01-28 72 37 97.0 -0.618557
2001-02-04 84 24 72.0 -0.666667
Run Code Online (Sandbox Code Playgroud)
df.groupby
使用pd.Grouper
对象可以使用几周对行进行分组.您可以使用该agg
方法查找每个组中max
的last
值和值Nums
:
In [163]: df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])
In [164]: df2.head()
Out[164]:
max last
Dates
2001-01-07 83 39
2001-01-14 75 75
2001-01-21 97 18
2001-01-28 72 37
2001-02-04 84 24
Run Code Online (Sandbox Code Playgroud)
然后使用shift(1)
将max
值向下移动一行:
In [165]: df2['previous_max'] = df2['max'].shift(1); df2.head()
Out[165]:
max last previous_max
Dates
2001-01-07 83 39 NaN
2001-01-14 75 75 83.0
2001-01-21 97 18 75.0
2001-01-28 72 37 97.0
2001-02-04 84 24 72.0
Run Code Online (Sandbox Code Playgroud)
然后可以通过简单的减法和除法来计算百分比变化:
In [166]: df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']; df2.head()
Out[166]:
max last previous_max change
Dates
2001-01-07 83 39 NaN NaN
2001-01-14 75 75 83.0 -0.096386
2001-01-21 97 18 75.0 -0.760000
2001-01-28 72 37 97.0 -0.618557
2001-02-04 84 24 72.0 -0.666667
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1176 次 |
最近记录: |