agf*_*ing 8 python series apply pandas rolling-computation
我有一个日期时间系列的 dtype:float64。我正在尝试将自定义函数应用于该系列的滚动窗口。我希望这个函数返回字符串。但是,这会生成 TypeError。为什么这会产生错误,有没有办法直接通过应用一个函数来使这个工作?
下面是一个例子:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
new_series = number_series.rolling(5).apply(func)
Run Code Online (Sandbox Code Playgroud)
结果是以下错误:
TypeError: must be real number, not str
Run Code Online (Sandbox Code Playgroud)
我目前采用的解决方法是修改 func 以将整数输出到一个系列,然后将另一个函数应用于该系列以生成新系列。按照下面的例子:
def func_float(s):
if s[-1] > s[-2] > s[-3]:
return 1
elif s[-1] > s[-2]:
return 2
else:
return 3
float_series = number_series.rolling(5).apply(func_float)
def func_text(s):
if s == 1:
return 'High'
elif s == 2:
return 'Medium'
else:
return 'Low'
new_series = float_series.apply(func_text)
Run Code Online (Sandbox Code Playgroud)
这给出了生成错误的初始代码的预期结果:
new_series
2000-01-02 Low
2000-01-09 Low
2000-01-16 Low
2000-01-23 Low
2000-01-30 Medium
...
2001-10-28 Low
2001-11-04 Medium
2001-11-11 High
2001-11-18 High
2001-11-25 Low
Length: 100, dtype: object
Run Code Online (Sandbox Code Playgroud)
请注意,对象apply的函数与对象的函数不同,我同意你的观点,这有点令人困惑。根据我的理解,应用于滚动窗口的函数通常用于数据聚合(例如等)。RollingapplySeriessumcount
但是,您可以将滚动窗口转换为列表并将该函数应用于该列表(感谢此讨论)。
所以我的方法是:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if len(s) > 2:
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
else:
return ''
list = [func(window) for window in list(number_series.rolling(5))]
new_series = pd.Series(list, index=number_series.index)
Run Code Online (Sandbox Code Playgroud)
另请注意,func需要以不同的方式处理第一项,因为否则索引将超出范围。
一种方法是:
WindowIndexer或rolling()方法。func返回字符串并将结果存储为列表import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if (len(s) >= 3) and (s[-1] > s[-2] > s[-3]):
return 'High'
elif (len(s) >= 2) and s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
# Step 1: Get the window indexer
window_indexer = number_series.rolling(5)._get_window_indexer()
start, end = window_indexer.get_window_bounds(num_values=len(number_series))
# Step 2: Apply func
results = [func(number_series.iloc[slice(s, e)]) for s, e in zip(start, end)]
# Step 3: Get results back to a pandas Series
new_series = pd.Series(results, index=number_series.index)
new_series
>>>
2000-01-02 Low
2000-01-09 Low
2000-01-16 Medium
2000-01-23 Low
2000-01-30 Medium
...
2001-10-28 Low
2001-11-04 Medium
2001-11-11 High
2001-11-18 High
2001-11-25 Low
Length: 100, dtype: object
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
166 次 |
| 最近记录: |