通过向量化提高 np.irr 函数的性能

Question

通过向量化提高 np.irr 函数的性能

是否有可能提高 np.irr 函数的性能，使其可以在不使用 for 循环的情况下应用于二维现金流数组 - 通过向量化 np.irr 函数或通过替代算法？

numpy 库中的 irr 函数计算周期性复合回报率，为现金流数组提供 0 的净现值。该函数只能应用于一维数组：

x = np.array([-100,50,50,50])
r = np.irr(x)

Run Code Online (Sandbox Code Playgroud)

np.irr 不适用于二维现金流数组，例如：

cfs = np.zeros((10000,4))
cfs[:,0] = -100
cfs[:,1:] = 50

Run Code Online (Sandbox Code Playgroud)

其中每行代表一系列现金流，列代表时间段。因此，缓慢的实现是循环每一行并将 np.irr 应用于各个行：

out = []
for x in cfs:
    out.append(np.irr(x))

Run Code Online (Sandbox Code Playgroud)

对于大型数组，这是一个优化障碍。查看 np.irr 函数的源代码，我认为主要障碍是向量化 np.roots 函数：

def irr(values):
    res = np.roots(values[::-1])
    mask = (res.imag == 0) & (res.real > 0)
    if res.size == 0:
        return np.nan
    res = res[mask].real
    # NPV(rate) = 0 can have more than one solution so we return
    # only the solution closest to zero.
    rate = 1.0/res - 1
    rate = rate.item(np.argmin(np.abs(rate)))
    return rate

Run Code Online (Sandbox Code Playgroud)

我在 R 中找到了类似的实现：Fast Loan Rate Calculation for a big number of Loans，但不知道如何将其移植到 Python 中。另外，我不认为 np.apply_along_axis 或 np.vectorize 是这个问题的解决方案，因为我主要关心的是性能，而且我知道两者都是 for 循环的包装器。

谢谢！

Answer 1

小智 4

从源头来看np.roots，

import inspect
print(inspect.getsource(np.roots))

Run Code Online (Sandbox Code Playgroud)

我们看到它的工作原理是找到“伴随矩阵”的特征值。它还对零系数进行一些特殊处理。我真的不明白数学背景，但我确实知道np.linalg.eigvals可以以矢量化的方式计算多个矩阵的特征值。

将其与的源代码合并np.irr产生了以下“弗兰肯代码”：

def irr_vec(cfs):
    # Create companion matrix for every row in `cfs`
    M, N = cfs.shape
    A = np.zeros((M, (N-1)**2))
    A[:,N-1::N] = 1
    A = A.reshape((M,N-1,N-1))
    A[:,0,:] = cfs[:,-2::-1] / -cfs[:,-1:]  # slice [-1:] to keep dims

    # Calculate roots; `eigvals` is a gufunc
    res = np.linalg.eigvals(A)

    # Find the solution that makes the most sense...
    mask = (res.imag == 0) & (res.real > 0)
    res = np.ma.array(res.real, mask=~mask, fill_value=np.nan)
    rate = 1.0/res - 1
    idx = np.argmin(np.abs(rate), axis=1)
    irr = rate[np.arange(M), idx].filled()
    return irr

Run Code Online (Sandbox Code Playgroud)

这不会处理零系数，并且当时肯定会失败any(cfs[:,-1] == 0)。另外，一些输入参数检查也不会造成伤害。也许还有其他一些问题？但对于提供的示例数据，它实现了我们想要的（以增加内存使用为代价）：

In [487]: cfs = np.zeros((10000,4))
     ...: cfs[:,0] = -100
     ...: cfs[:,1:] = 50

In [488]: %timeit [np.irr(x) for x in cfs]
1 loops, best of 3: 2.96 s per loop

In [489]: %timeit irr_vec(cfs)
10 loops, best of 3: 77.8 ms per loop

Run Code Online (Sandbox Code Playgroud)

如果您有固定偿还金额的贷款的特殊情况（如您链接的问题中），您可能可以使用插值更快地完成...

归档时间：	11 年，6 月前
查看次数：	2456 次
最近记录：	3 年，4 月前