有条件地填写熊猫数据框

Ger*_*rry 5 python dataframe pandas

df在column中有一个带有浮点值的数据框A。我想添加另一列,B例如:

  1. B[0] = A[0]

    为了i > 0...

  2. B[i] = if(np.isnan(A[i])) then A[i] else Step3
  3. B[i] = if(abs((B[i-1] - A[i]) / B[i-1]) < 0.3) then B[i-1] else A[i]

df可以如下生成样本数据框

import numpy as np
import pandas as pd
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=list('A'))
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan
df.loc[241, 'A'] = np.nan
Run Code Online (Sandbox Code Playgroud)

jpp*_*jpp 3

使用Numba可以相当有效地完成此操作。如果您无法使用 Numba,只需省略@njit,您的逻辑将作为 Python 级循环运行。

import numpy as np
import pandas as pd
from numba import njit

np.random.seed(0)
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=['A'])
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan

@njit
def recurse_nb(x):
    out = x.copy()
    for i in range(1, x.shape[0]):
        if not np.isnan(x[i]) and (abs(1 - x[i] / out[i-1]) < 0.3):
            out[i] = out[i-1]
    return out

df['B'] = recurse_nb(df['A'].values)

print(df.head(10))

             A            B
0  3764.052346  3764.052346
1          NaN          NaN
2  2978.737984  2978.737984
3  4240.893199  4240.893199
4  3867.557990  4240.893199
5  1022.722120  1022.722120
6  2950.088418  2950.088418
7  1848.642792  1848.642792
8  1896.781148  1848.642792
9  2410.598502  2410.598502
Run Code Online (Sandbox Code Playgroud)