熊猫中的条件累积和

Rus*_*s W 9 python pandas pandas-groupby

我是一名前Excel高级用户,为自己的罪过而pent悔。我需要帮助为我重新创建通用计算。

我正在尝试计算贷款组合的绩效。在分子中,我正在计算累计损失总额。在分母中,我需要包含在累计总额中的贷款的原始余额。

我无法弄清楚如何在Pandas中进行条件分组。在Excel中,这非常简单,因此我希望自己对此有所考虑。

我在StackOverflow的问题上找不到很多,但这是最接近的:python pandas条件累积总和

我无法弄清楚的是我的条件基于索引中的值并包含在列中

以下是我的数据:

| Loan    | Origination | Balance | NCO Date  | NCO | As of Date | Age     (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1  | 1/31/2011   | 1000    | 1/31/2018 | 25  | 5/31/2019  | 100              | 84               |
| Loan 2  | 3/31/2011   | 2500    |           | 0   | 5/31/2019  | 98           |                  |
| Loan 3  | 5/31/2011   | 3000    | 1/31/2019 | 15  | 5/31/2019  | 96           | 92               |
| Loan 4  | 7/31/2011   | 2500    |           | 0   | 5/31/2019  | 94           |                  |
| Loan 5  | 9/30/2011   | 1500    | 3/31/2019 | 35  | 5/31/2019  | 92           | 90               |
| Loan 6  | 11/30/2011  | 2500    |           | 0   | 5/31/2019  | 90           |                  |
| Loan 7  | 1/31/2012   | 1000    | 5/31/2019 | 5   | 5/31/2019  | 88           | 88               |
| Loan 8  | 3/31/2012   | 2500    |           | 0   | 5/31/2019  | 86           |                  |
| Loan 9  | 5/31/2012   | 1000    |           | 0   | 5/31/2019  | 84           |                  |
| Loan 10 | 7/31/2012   | 1250    |           | 0   | 5/31/2019  | 82           |                  |
Run Code Online (Sandbox Code Playgroud)

在Excel中,我将使用以下公式计算总和:

未结余额: =SUMIFS(Balance,Age (Months),Reference Age)

Cumulative NCO: =SUMIFS(NCO,Age (Months),>=Reference Age,NCO Age (Months),<=&Reference Age)
Run Code Online (Sandbox Code Playgroud)

数据:

| Reference Age       | 85    | 90    | 95   | 100  
|---------------------|-------|-------|------|------
| Outstanding Balance | 16500 | 13000 | 6500 | 1000 
| Cumulative NCO      | 25    | 60    | 40   | 25   
Run Code Online (Sandbox Code Playgroud)

这里的目标是包括“未清余额”中已过时足以观察NCO的事物。NCO是截至该点为止未偿还贷款的总金额。

编辑:

这样我已经计算了。但这是最有效的吗?

| Loan    | Origination | Balance | NCO Date  | NCO | As of Date | Age     (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1  | 1/31/2011   | 1000    | 1/31/2018 | 25  | 5/31/2019  | 100              | 84               |
| Loan 2  | 3/31/2011   | 2500    |           | 0   | 5/31/2019  | 98           |                  |
| Loan 3  | 5/31/2011   | 3000    | 1/31/2019 | 15  | 5/31/2019  | 96           | 92               |
| Loan 4  | 7/31/2011   | 2500    |           | 0   | 5/31/2019  | 94           |                  |
| Loan 5  | 9/30/2011   | 1500    | 3/31/2019 | 35  | 5/31/2019  | 92           | 90               |
| Loan 6  | 11/30/2011  | 2500    |           | 0   | 5/31/2019  | 90           |                  |
| Loan 7  | 1/31/2012   | 1000    | 5/31/2019 | 5   | 5/31/2019  | 88           | 88               |
| Loan 8  | 3/31/2012   | 2500    |           | 0   | 5/31/2019  | 86           |                  |
| Loan 9  | 5/31/2012   | 1000    |           | 0   | 5/31/2019  | 84           |                  |
| Loan 10 | 7/31/2012   | 1250    |           | 0   | 5/31/2019  | 82           |                  |
Run Code Online (Sandbox Code Playgroud)

Ser*_*sta 2

您可以根据变量使用复杂的条件。找到简单累积和的矢量化方法很容易,但我无法想象累积 NCO 的好方法。

所以我会恢复到 Python 理解:

data = [
    { 'Reference Age': ref,
      'Outstanding Balance': df.loc[df.iloc[:,6]>=ref,'Balance'].sum(),
      'Cumulative NCO': df.loc[(df.iloc[:,6]>=ref)&(df.iloc[:,7]<=ref),
                   'NCO'].sum() }
    for ref in [85, 90, 95, 100]]

result = pd.DataFrame(data).set_index('Reference Age').T
Run Code Online (Sandbox Code Playgroud)

它生产:

Reference Age          85     90    95    100
Cumulative NCO          25     60    40    25
Outstanding Balance  16500  13000  6500  1000
Run Code Online (Sandbox Code Playgroud)