删除pandas中2个特定列之间的空值

Iya*_*qel 3 python pandas

我有以下时间序列数据帧.我想用之前的值填充缺失的值.但是我只想填充first_valid_index和last_valid索引之间的缺失值.所以我想要填充的列对于每一行都是不同的.我怎样才能做到这一点?

所以,给定这个数据帧.

import numpy as np
import pandas as pd
df = pd.DataFrame([[1, 2 ,3,np.nan,5], [1, 3 , np.nan , 4 , np.nan], [4, np.nan , 7 , np.nan,np.nan]], columns=[2007,2008,2009,2010,2011])
Run Code Online (Sandbox Code Playgroud)

输入数据帧:

    2007    2008    2009    2010    2011
     1       2       3      NaN     5
     1       3       NaN    4       NaN
     4       Nan     7      NaN     NaN     
Run Code Online (Sandbox Code Playgroud)

输出数据帧:

2007    2008    2009    2010    2011
 1       2       3        3      5
 1       3       3        4      NaN
 4       4       7        NaN    NaN
Run Code Online (Sandbox Code Playgroud)

我想为first_valid_index和last_valid_index创建新列,然后使用.apply(),但是如何在每行填充不同的列?

def fillMissing(x):
    first_valid = int(x["first_valid"])
    last_valid = int(x["last_valid"])
    for i in range(first_valid,last_valid + 1):
        missing.append(i)
    #What should i do here since the following is not valid 
    #x[missing] = x[missing].fillna(method='ffill', axis=1)


df.apply(fillMissing , axis=1)
Run Code Online (Sandbox Code Playgroud)

piR*_*red 5

你可以这样做,iloc但我更喜欢用Numpy这样做.实质上,用于ffill转发填充值,然后屏蔽NaN一直到最后的值.

v = df.values

mask = np.logical_and.accumulate(
    np.isnan(v)[:, ::-1], axis=1)[:, ::-1]

df.ffill(axis=1).mask(mask)

   2007  2008  2009  2010  2011
0   1.0   2.0   3.0   3.0   5.0
1   1.0   3.0   3.0   4.0   NaN
2   4.0   4.0   7.0   NaN   NaN
Run Code Online (Sandbox Code Playgroud)