我试图将时间段的总和均匀地分配给较高采样时间段的分量.
我做了什么:
>>> rng = pandas.PeriodIndex(start='2014-01-01', periods=2, freq='W')
>>> ts = pandas.Series([i+1 for i in range(len(rng))], index=rng)
>>> ts
2013-12-30/2014-01-05 1
2014-01-06/2014-01-12 2
Freq: W-SUN, dtype: float64
>>> ts.resample('D')
2013-12-30 1
2013-12-31 NaN
2014-01-01 NaN
2014-01-02 NaN
2014-01-03 NaN
2014-01-04 NaN
2014-01-05 NaN
2014-01-06 2
2014-01-07 NaN
2014-01-08 NaN
2014-01-09 NaN
2014-01-10 NaN
2014-01-11 NaN
2014-01-12 NaN
Freq: D, dtype: float64
Run Code Online (Sandbox Code Playgroud)
我真正想要的是:
>>> ts.resample('D', some_miracle_thing)
2013-12-30 1/7
2013-12-31 1/7
2014-01-01 1/7
2014-01-02 1/7
2014-01-03 1/7
2014-01-04 1/7
2014-01-05 1/7
2014-01-06 2/7
2014-01-07 2/7
2014-01-08 2/7
2014-01-09 2/7
2014-01-10 2/7
2014-01-11 2/7
2014-01-12 2/7
Freq: D, dtype: float64
Run Code Online (Sandbox Code Playgroud)
有没有办法做到这一点
x/7lambda函数?有点令人费解,但这行得通吗?
首先,当您重新采样时,添加一个,.groupby(level=0)以便保留原始时间戳。(基于此答案)
rs = ts.groupby(level=0).resample('D')
Run Code Online (Sandbox Code Playgroud)
然后在MultiIndex的第一级应用groupby以应用所需的操作。
In [285]: rs.groupby(level=0).transform(lambda x: x.iloc[0] / float(len(x)))
Out[285]:
2013-12-30/2014-01-05 2013-12-30 0.142857
2013-12-31 0.142857
2014-01-01 0.142857
2014-01-02 0.142857
2014-01-03 0.142857
2014-01-04 0.142857
2014-01-05 0.142857
2014-01-06/2014-01-12 2014-01-06 0.285714
2014-01-07 0.285714
2014-01-08 0.285714
2014-01-09 0.285714
2014-01-10 0.285714
2014-01-11 0.285714
2014-01-12 0.285714
dtype: float64
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
621 次 |
| 最近记录: |