如何向量化 Pandas 中的累积运算

Ral*_*ber 6 python vectorization accumulate pandas

基于如何对使用先前值的操作进行向量化?,我无法回答以下问题:

有没有办法对期末值 (VEoP) 列进行矢量化?

import pandas as pd

terms = pd.date_range(start = '2022-01-01', periods=12, freq='YS', normalize=True)
df = pd.DataFrame({
    'Return':   [1.063, 1.053, 1.008, 0.98, 1.04, 1.057, 1.073, 1.027, 1.025, 1.068, 1.001, 0.983],
    'Cashflow': [6, 0, 0, 8, -1, -1, -1, -1, -1, -1, -1, -1]
    },index=terms.strftime('%Y'))
df.index.name = 'Date'

df['VEoP'] = 0
for y in range(0, df.index.size):
    df['VEoP'].iloc[y] = ((0 if y==0 else df['VEoP'].iloc[y-1]) + df['Cashflow'].iloc[y]) * df['Return'].iloc[y]

df

    Return  Cashflow    VEoP
Date                          
2022  1.0630         6  6.3780
2023  1.0530         0  6.7160
2024  1.0080         0  6.7698
2025  0.9800         8 14.4744
2026  1.0400        -1 14.0133
2027  1.0570        -1 13.7551
2028  1.0730        -1 13.6862
2029  1.0270        -1 13.0288
2030  1.0250        -1 12.3295
2031  1.0680        -1 12.0999
2032  1.0010        -1 11.1110
2033  0.9830        -1  9.9391
Run Code Online (Sandbox Code Playgroud)

Eli*_*adL 3

当每个值都依赖于它之前的值时,矢量化就会受到限制,因为它无法并行化。

因此,非矢量化解决方案具有accumulate

df['VEoP'] = list(accumulate(
    df.to_records(),
    lambda prev_veop, new: (prev_veop + new.Cashflow) * new.Return,
    initial=0,
))[1:]
Run Code Online (Sandbox Code Playgroud)

执行效果与 numpy“向量化”一样好:

df['VEoP'] = np.frompyfunc(
    lambda prev_veop, new: (prev_veop + new.Cashflow) * new.Return,
    2, 1,  # nin, nout
).accumulate(
    [0, *df.to_records()],
    dtype=object,  # temporary conversion
).astype(float)[1:]
Run Code Online (Sandbox Code Playgroud)

它可以被分解成更小的逻辑块:

def get_ufunc(func, nin, nout):  return np.frompyfunc(func, nin, nout)
def get_binary_ufunc(func):      return get_ufunc(func, nin=2, nout=1)
def accum(func):                 return get_binary_ufunc(func).accumulate
def accum_float(func, x):        return accum(func)(x, dtype=object).astype(float)
def accum_float_from_0(func, x): return accum_float(func, [0, *x])[1:]

def calc_veop(prev_veop, new):   return (prev_veop + new.Cashflow) * new.Return
def accum_veop(records):         return accum_float_from_0(calc_veop, records)

df['VEoP'] = accum_veop(df.to_records())
Run Code Online (Sandbox Code Playgroud)

您可以阅读有关np.frompyfunc和 的更多信息np.ufunc.accumulate