获取滚动窗口中的第一个和最后一个值

pie*_*e_j 2 python pandas rolling-computation

初始问题陈述

使用 pandas,我想应用可用于 resample() 的函数,但不适用于rolling()。

这有效:

df1 = df.resample(to_freq,
                  closed='left',
                  kind='period',
                   ).agg(OrderedDict([('Open', 'first'),
                                      ('Close', 'last'),
                                                        ]))
Run Code Online (Sandbox Code Playgroud)

这不会:

df2 = df.rolling(my_indexer).agg(
                 OrderedDict([('Open', 'first'),
                              ('Close', 'last') ]))
>>> AttributeError: 'first' is not a valid function for 'Rolling' object

df3 = df.rolling(my_indexer).agg(
                 OrderedDict([
                              ('Close', 'last') ]))
>>> AttributeError: 'last' is not a valid function for 'Rolling' object
Run Code Online (Sandbox Code Playgroud)

您对将滚动窗口的第一个和最后一个值保留在两个不同的列中有何建议?

编辑 1 - 使用可用的输入数据

import pandas as pd
from random import seed
from random import randint
from collections import OrderedDict

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0,10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)

# First & last work with resample
resampled_first = df.resample('3H',
                              closed='left',
                              kind='period',
                             ).agg(OrderedDict([('Values', 'first')]))
resampled_last = df.resample('3H',
                             closed='left',
                             kind='period',
                            ).agg(OrderedDict([('Values', 'last')]))

# They don't with rolling
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'first')]))
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'last')]))
Run Code Online (Sandbox Code Playgroud)

感谢您的帮助!最好的,

fur*_*ras 9

您可以使用自己的函数来获取滚动窗口中的第一个或最后一个元素

rolling_first = df.rolling(3).agg(lambda rows: rows[0])
rolling_last  = df.rolling(3).agg(lambda rows: rows[-1])
Run Code Online (Sandbox Code Playgroud)

例子

import pandas as pd
from random import seed, randint

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')

seed(1)
values = [randint(0, 10) for ts in ts_1h]

df = pd.DataFrame({'Values' : values}, index=ts_1h)

df['first'] = df['Values'].rolling(3).agg(lambda rows: rows[0])
df['last']  = df['Values'].rolling(3).agg(lambda rows: rows[-1])

print(df)
Run Code Online (Sandbox Code Playgroud)

结果

                          Values  first  last
2020-01-01 00:00:00+00:00       2    NaN   NaN
2020-01-01 01:00:00+00:00       9    NaN   NaN
2020-01-01 02:00:00+00:00       1    2.0   1.0
2020-01-01 03:00:00+00:00       4    9.0   4.0
2020-01-01 04:00:00+00:00       1    1.0   1.0
2020-01-01 05:00:00+00:00       7    4.0   7.0
2020-01-01 06:00:00+00:00       7    1.0   7.0
2020-01-01 07:00:00+00:00       7    7.0   7.0
2020-01-01 08:00:00+00:00      10    7.0  10.0
2020-01-01 09:00:00+00:00       6    7.0   6.0
2020-01-01 10:00:00+00:00       3   10.0   3.0
2020-01-01 11:00:00+00:00       1    6.0   1.0
2020-01-01 12:00:00+00:00       7    3.0   7.0
2020-01-01 13:00:00+00:00       0    1.0   0.0
2020-01-01 14:00:00+00:00       6    7.0   6.0
2020-01-01 15:00:00+00:00       6    0.0   6.0
2020-01-01 16:00:00+00:00       9    6.0   9.0
2020-01-01 17:00:00+00:00       0    6.0   0.0
2020-01-01 18:00:00+00:00       7    9.0   7.0
2020-01-01 19:00:00+00:00       4    0.0   4.0
2020-01-01 20:00:00+00:00       3    7.0   3.0
2020-01-01 21:00:00+00:00       9    4.0   9.0
2020-01-01 22:00:00+00:00       1    3.0   1.0
2020-01-01 23:00:00+00:00       5    9.0   5.0
2020-01-02 00:00:00+00:00       0    1.0   0.0
Run Code Online (Sandbox Code Playgroud)

编辑:

使用字典,你必须直接输入lambda,而不是字符串

result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last':  lambda rows: rows[-1]})
print(result)
Run Code Online (Sandbox Code Playgroud)

与自己的函数相同 - 你必须输入它的名称,而不是带有名称的字符串

def first(rows):
    return rows[0]

def last(rows):
    return rows[-1]

result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)
Run Code Online (Sandbox Code Playgroud)

例子

import pandas as pd
from random import seed, randint

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')

seed(1)
values = [randint(0, 10) for ts in ts_1h]

df = pd.DataFrame({'Values' : values}, index=ts_1h)

result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]})
print(result)

def first(rows):
    return rows[0]

def mylast(rows):
    return rows[-1]

result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)
Run Code Online (Sandbox Code Playgroud)