如何在pandas中实现sql coalesce

Ano*_*oop 8 python pandas

我有一个数据框

df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7 
Run Code Online (Sandbox Code Playgroud)

我想添加一个新列'D'.预期的产出是

     A     B   C    D
0  1.0   NaN   5    1.0
1  2.0  10.0  10    2.0
2  NaN   NaN   7    7.0
Run Code Online (Sandbox Code Playgroud)

提前致谢!

yar*_*le8 13

另一种方法是使用 a 的combine_first方法pd.Series。使用你的例子df

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
>>> df
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7
Run Code Online (Sandbox Code Playgroud)

我们有

>>> df.A.combine_first(df.B).combine_first(df.C)
0    1.0
1    2.0
2    7.0
Run Code Online (Sandbox Code Playgroud)

我们可以使用reduce抽象此模式来处理任意数量的列。

>>> from functools import reduce
>>> cols = [df[c] for c in df.columns]
>>> reduce(lambda acc, col: acc.combine_first(col), cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64
Run Code Online (Sandbox Code Playgroud)

让我们把这一切放在一个函数中。

>>> def coalesce(*args):
...     return reduce(lambda acc, col: acc.combine_first(col), args)
...
>>> coalesce(*cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64
Run Code Online (Sandbox Code Playgroud)


phi*_*hem 10

另一种方法是按此顺序用A,B,C显式填充D列。

df['D'] = np.nan
df['D'] = df.D.fillna(df.A).fillna(df.B).fillna(df.C)
Run Code Online (Sandbox Code Playgroud)


jez*_*ael 6

我认为您需要bfill选择第一列iloc:

df['D'] = df.bfill(axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
Run Code Online (Sandbox Code Playgroud)

与...一样:

df['D'] = df.fillna(method='bfill',axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
Run Code Online (Sandbox Code Playgroud)


piR*_*red 5

选项1
pandas

df.assign(D=df.lookup(df.index, df.isnull().idxmin(1)))

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
Run Code Online (Sandbox Code Playgroud)

选项2
numpy

v = df.values
j = np.isnan(v).argmin(1)
df.assign(D=v[np.arange(len(v)), j])

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0
Run Code Online (Sandbox Code Playgroud)

对给定数据进行幼稚的时间测试

在此处输入图片说明

在更大的数据上

在此处输入图片说明