python - 使用上一行的值来更新新行值

qqq*_*www 6 python dataframe pandas

这是当前的数据帧:

> ID        Date    current
> 2001980   10/30/2017  1   
> 2001980   10/29/2017  0   
> 2001980   10/28/2017  0   
> 2001980   10/27/2017  40  
> 2001980   10/26/2017  39  
> 2001980   10/25/2017  0   
> 2001980   10/24/2017  0   
> 2001980   10/23/2017  60  
> 2001980   10/22/2017  0   
> 2001980   10/21/2017  0   
> 2002222   10/21/2017  0   
> 2002222   10/20/2017  0   
> 2002222   10/19/2017  16  
> 2002222   10/18/2017  0   
> 2002222   10/17/2017  0   
> 2002222   10/16/2017  20  
> 2002222   10/15/2017  19  
> 2002222   10/14/2017  18  
Run Code Online (Sandbox Code Playgroud)

以下是最终的数据框架.专栏expected是我想要的.

  1. 一个ID可能有多个日期/记录/行.(ID +日期)是唯一的.
  2. 此行的预期值=最后一行的预期值 - 1
  3. 最小值为0.
  4. 根据2中的公式,如果此行的预期值<此行的当前值,则使用此行的当前值.例如,在2017年10月23日的ID 2001980.根据规则2,该值应为36,但根据规则4,36 <60,因此我们使用60.

非常感谢.

> ID        Date    current expected 
> 2001980   10/30/2017  1   1 
> 2001980   10/29/2017  0   0
> 2001980   10/28/2017  0   0 
> 2001980   10/27/2017  40  40
> 2001980   10/26/2017  39  39 
> 2001980   10/25/2017  0   38
> 2001980   10/24/2017  0   37 
> 2001980   10/23/2017  60  60
> 2001980   10/22/2017  0   59 
> 2001980   10/21/2017  0   58
> 2002222   10/21/2017  0   0
> 2002222   10/20/2017  0   0 
> 2002222   10/19/2017  16  16
> 2002222   10/18/2017  0   15 
> 2002222   10/17/2017  0   14
> 2002222   10/16/2017  20  20
> 2002222   10/15/2017  19  19
> 2002222   10/14/2017  18  18
Run Code Online (Sandbox Code Playgroud)

我正在使用以下公式的Excel:

= if(此行的ID =最后一行的ID,最大值(最后一行的预期值-1,此行的当前值),此行的当前值)

Tar*_*ani 3

所以你可以用这个来做到这apply一点nested functions

import pandas as pd
ID = [2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2002222,2002222,2002222,2002222,2002222,2002222,2002222,2002222,]
Date = ["10/30/2017","10/29/2017","10/28/2017","10/27/2017","10/26/2017","10/25/2017","10/24/2017","10/23/2017","10/22/2017","10/21/2017","10/21/2017","10/20/2017","10/19/2017","10/18/2017","10/17/2017","10/16/2017","10/15/2017","10/14/2017",]
current = [1 ,0 ,0 ,40,39,0 ,0 ,60,0 ,0 ,0 ,0 ,16,0 ,0 ,20,19,18,]

df = pd.DataFrame({"ID": ID, "Date": Date, "current": current})
Run Code Online (Sandbox Code Playgroud)

然后创建更新框架的函数

Python 3.X

def update_frame(df):
    last_expected = None
    def apply_logic(row):
        nonlocal last_expected
        last_row_id = row.name - 1
        if row.name == 0:
            last_expected = row["current"]
            return last_expected
        last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
        last_expected = max(last_expected-1,row['current']) if last_row['ID'] == row['ID'] else row['current']
        return last_expected
    return apply_logic
Run Code Online (Sandbox Code Playgroud)

Python 2.X

def update_frame(df):
    sd = {"last_expected": None}
    def apply_logic(row):
        last_row_id = row.name - 1
        if row.name == 0:
            sd['last_expected'] = row["current"]
            return sd['last_expected']
        last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
        sd['last_expected'] = max(sd['last_expected'] - 1,row['current']) if last_row['ID'] == row['ID'] else row['current']
        return sd['last_expected']
    return apply_logic
Run Code Online (Sandbox Code Playgroud)

并运行如下所示的函数

df['expected'] = df.apply(update_frame(df), axis=1)
Run Code Online (Sandbox Code Playgroud)

输出符合预期

输出