Rus*_*s W 9 python pandas pandas-groupby
我是一名前Excel高级用户,为自己的罪过而pent悔。我需要帮助为我重新创建通用计算。
我正在尝试计算贷款组合的绩效。在分子中,我正在计算累计损失总额。在分母中,我需要包含在累计总额中的贷款的原始余额。
我无法弄清楚如何在Pandas中进行条件分组。在Excel中,这非常简单,因此我希望自己对此有所考虑。
我在StackOverflow的问题上找不到很多,但这是最接近的:python pandas条件累积总和
我无法弄清楚的是我的条件基于索引中的值并包含在列中
以下是我的数据:
| Loan | Origination | Balance | NCO Date | NCO | As of Date | Age (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1 | 1/31/2011 | 1000 | 1/31/2018 | 25 | 5/31/2019 | 100 | 84 |
| Loan 2 | 3/31/2011 | 2500 | | 0 | 5/31/2019 | 98 | |
| Loan 3 | 5/31/2011 | 3000 | 1/31/2019 | 15 | 5/31/2019 | 96 | 92 |
| Loan 4 | 7/31/2011 | 2500 | | 0 | 5/31/2019 | 94 | |
| Loan 5 | 9/30/2011 | 1500 | 3/31/2019 | 35 | 5/31/2019 | 92 | 90 |
| Loan 6 | 11/30/2011 | 2500 | | 0 | 5/31/2019 | 90 | |
| Loan 7 | 1/31/2012 | 1000 | 5/31/2019 | 5 | 5/31/2019 | 88 | 88 |
| Loan 8 | 3/31/2012 | 2500 | | 0 | 5/31/2019 | 86 | |
| Loan 9 | 5/31/2012 | 1000 | | 0 | 5/31/2019 | 84 | |
| Loan 10 | 7/31/2012 | 1250 | | 0 | 5/31/2019 | 82 | |
Run Code Online (Sandbox Code Playgroud)
在Excel中,我将使用以下公式计算总和:
未结余额: =SUMIFS(Balance,Age (Months),Reference Age)
Cumulative NCO: =SUMIFS(NCO,Age (Months),>=Reference Age,NCO Age (Months),<=&Reference Age)
Run Code Online (Sandbox Code Playgroud)
数据:
| Reference Age | 85 | 90 | 95 | 100
|---------------------|-------|-------|------|------
| Outstanding Balance | 16500 | 13000 | 6500 | 1000
| Cumulative NCO | 25 | 60 | 40 | 25
Run Code Online (Sandbox Code Playgroud)
这里的目标是包括“未清余额”中已过时足以观察NCO的事物。NCO是截至该点为止未偿还贷款的总金额。
编辑:
这样我已经计算了。但这是最有效的吗?
| Loan | Origination | Balance | NCO Date | NCO | As of Date | Age (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1 | 1/31/2011 | 1000 | 1/31/2018 | 25 | 5/31/2019 | 100 | 84 |
| Loan 2 | 3/31/2011 | 2500 | | 0 | 5/31/2019 | 98 | |
| Loan 3 | 5/31/2011 | 3000 | 1/31/2019 | 15 | 5/31/2019 | 96 | 92 |
| Loan 4 | 7/31/2011 | 2500 | | 0 | 5/31/2019 | 94 | |
| Loan 5 | 9/30/2011 | 1500 | 3/31/2019 | 35 | 5/31/2019 | 92 | 90 |
| Loan 6 | 11/30/2011 | 2500 | | 0 | 5/31/2019 | 90 | |
| Loan 7 | 1/31/2012 | 1000 | 5/31/2019 | 5 | 5/31/2019 | 88 | 88 |
| Loan 8 | 3/31/2012 | 2500 | | 0 | 5/31/2019 | 86 | |
| Loan 9 | 5/31/2012 | 1000 | | 0 | 5/31/2019 | 84 | |
| Loan 10 | 7/31/2012 | 1250 | | 0 | 5/31/2019 | 82 | |
Run Code Online (Sandbox Code Playgroud)
您可以根据变量使用复杂的条件。找到简单累积和的矢量化方法很容易,但我无法想象累积 NCO 的好方法。
所以我会恢复到 Python 理解:
data = [
{ 'Reference Age': ref,
'Outstanding Balance': df.loc[df.iloc[:,6]>=ref,'Balance'].sum(),
'Cumulative NCO': df.loc[(df.iloc[:,6]>=ref)&(df.iloc[:,7]<=ref),
'NCO'].sum() }
for ref in [85, 90, 95, 100]]
result = pd.DataFrame(data).set_index('Reference Age').T
Run Code Online (Sandbox Code Playgroud)
它生产:
Reference Age 85 90 95 100
Cumulative NCO 25 60 40 25
Outstanding Balance 16500 13000 6500 1000
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
274 次 |
| 最近记录: |