Python-Pandas 按列值的升序减去列值

hac*_*aho 3 python numpy dataframe python-3.x pandas

有一个数据框mortgage_data,其中列名称为mortgage_amount 和month(按升序排列)

mortgage_amount_paid = 1000

抵押数据:

name   mortgage_amount  month 
mark     400              1
mark     500              2
mark     200              3
Run Code Online (Sandbox Code Playgroud)

如何在数据框中mortgage_amount使用逐行按升序或月份扣除和更新,并添加一列,如果抵押贷款金额已完全扣除该金额,则添加为“是”,如果不是这样,则添加为“否”mortgage_amount_paidpaid_status

如果mortgage_amount_paid = 1000 抵押数据:

name   mortgage_amount  month  mortgage_amount_updated  paid_status 
mark     400              1         0                     full
mark     500              2         0                     full
mark     200              3       100                     partial
Run Code Online (Sandbox Code Playgroud)

前任:

如果mortgage_amount_paid = 600

抵押数据:

name   mortgage_amount  month  mortgage_amount_updated  paid_status 
mark     400              1         0                     full
mark     500              2       300                     partial
mark     200              3       200                     zero
Run Code Online (Sandbox Code Playgroud)

尝试过这个:

mortgage_amount_paid = 600

# amount saved - debt
m1 = df['mortgage_amount'].cumsum().sub(mortgage_amount_paid)
# is it positive?
m2 = m1>0
# is the previous month also positive?
m3 = m2.shift(fill_value=False)

df['mortgage_amount_updated'] = (m1.clip(0, mortgage_amount_paid)
                                   .mask(m3, df['mortgage_amount'])
                                 )
df['paid_status'] = np.select([m3, m2], ['zero', 'partial'], 'full')

Run Code Online (Sandbox Code Playgroud)

错误:我已经给出mortgage_amount_paid=400。已付费状态应为已付费,零,零。我得到报酬,部分,零

mortgage_amount_paid = 600

m = df['mortgage_amount'].cumsum()

df['paid_status'] = np.select(
    [m <= mortgage_amount_paid,
     (m > mortgage_amount_paid) & (m.shift() < mortgage_amount_paid)
     ],
    ['full', 'partial'],
    default='zero'
)
df['mortgage_amount_updated'] = np.select(
    [df['paid_status'].eq('full'),
     df['paid_status'].eq('partial')],
    [0, m-mortgage_amount_paid],
    default=df['mortgage_amount']
)
Run Code Online (Sandbox Code Playgroud)

错误:如果mortgage_amount_paid=1paid_status应该是partial,zero,zero。我得到部分零,零,零

Ony*_*mbu 6

你可以写一个函数:

def new(mortgage_amount_paid, df):
    m = df.mortgage_amount.cumsum()
    n = mortgage_amount_paid
    df['paid_status'] = np.where(m < n, 'full', 
             np.where(m - n < df.mortgage_amount, 'partial', 'zero'))
    return df # You do not have to since it does inplace replacement


new(600, df)
   name  mortgage_amount  month paid_status
0  mark              400      1        full
1  mark              500      2     partial
2  mark              200      3        zero

new(1000, df)
   name  mortgage_amount  month paid_status
0  mark              400      1        full
1  mark              500      2        full
2  mark              200      3     partial

new(100, df)
   name  mortgage_amount  month paid_status
0  mark              400      1     partial
1  mark              500      2        zero
2  mark              200      3        zero

new(2000, df)
   name  mortgage_amount  month paid_status
0  mark              400      1        full
1  mark              500      2        full
2  mark              200      3        full
Run Code Online (Sandbox Code Playgroud)