Pandas:使用先前的值和插值填充 nan

Eda*_*ame 5 dataframe python-3.x pandas

我有以下数据框df

    time            col_A
0   1520582580.000  79.000
1   1520582880.000  22.500
2   1520583180.000  29.361
3   1520583480.000  116.095
4   1520583780.000  19.972
5   1520584080.000  36.857
6   1520584380.000  15.167
7   1520584680.000  nan
8   1520584980.000  nan
9   1520585280.000  nan
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  nan
13  1520586480.000  48.833
14  1520586780.000  18.806
15  1520587080.000  18.583
Run Code Online (Sandbox Code Playgroud)

col_A有一些缺失的数据。我想创建一个col_B,它采用每个丢失记录的先前值。IE

6   1520584380.000  15.167
7   1520584680.000  15.167
8   1520584980.000  15.167
9   1520585280.000  15.167
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  17.583
13  1520586480.000  48.833
Run Code Online (Sandbox Code Playgroud)

和 a col_C,它使用前后最接近的非缺失点进行插值。IE

6   1520584380.000  15.167
7   1520584680.000  20.001
8   1520584980.000  24.834
9   1520585280.000  29.667
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  33.208
13  1520586480.000  48.833
Run Code Online (Sandbox Code Playgroud)

除了循环数据框以按记录进行计算之外,是否有一个内置函数可以用来以优雅的方式实现此目的?谢谢!

jez*_*ael 5

我认为ffill需要interpolate

df['colB'] = df['col_A'].ffill()
df['colc'] = df['col_A'].interpolate()
print (df)
            time    col_A     colB       colc
0   1.520583e+09   79.000   79.000   79.00000
1   1.520583e+09   22.500   22.500   22.50000
2   1.520583e+09   29.361   29.361   29.36100
3   1.520583e+09  116.095  116.095  116.09500
4   1.520584e+09   19.972   19.972   19.97200
5   1.520584e+09   36.857   36.857   36.85700
6   1.520584e+09   15.167   15.167   15.16700
7   1.520585e+09      NaN   15.167   20.00025
8   1.520585e+09      NaN   15.167   24.83350
9   1.520585e+09      NaN   15.167   29.66675
10  1.520586e+09   34.500   34.500   34.50000
11  1.520586e+09   17.583   17.583   17.58300
12  1.520586e+09      NaN   17.583   33.20800
13  1.520586e+09   48.833   48.833   48.83300
14  1.520587e+09   18.806   18.806   18.80600
15  1.520587e+09   18.583   18.583   18.58300
Run Code Online (Sandbox Code Playgroud)

如果想使用time插值方法:

df['time'] = pd.to_datetime(df['time'], unit='s')
df = df.set_index('time')
df['colB'] = df['col_A'].ffill()
df['colc'] = df['col_A'].interpolate('time')
print (df)
                       col_A     colB       colc
time                                            
2018-03-09 08:03:00   79.000   79.000   79.00000
2018-03-09 08:08:00   22.500   22.500   22.50000
2018-03-09 08:13:00   29.361   29.361   29.36100
2018-03-09 08:18:00  116.095  116.095  116.09500
2018-03-09 08:23:00   19.972   19.972   19.97200
2018-03-09 08:28:00   36.857   36.857   36.85700
2018-03-09 08:33:00   15.167   15.167   15.16700
2018-03-09 08:38:00      NaN   15.167   20.00025
2018-03-09 08:43:00      NaN   15.167   24.83350
2018-03-09 08:48:00      NaN   15.167   29.66675
2018-03-09 08:53:00   34.500   34.500   34.50000
2018-03-09 08:58:00   17.583   17.583   17.58300
2018-03-09 09:03:00      NaN   17.583   33.20800
2018-03-09 09:08:00   48.833   48.833   48.83300
2018-03-09 09:13:00   18.806   18.806   18.80600
2018-03-09 09:18:00   18.583   18.583   18.58300
Run Code Online (Sandbox Code Playgroud)