元素最多两个数据框忽略NaN

DrT*_*TRD 10 math dataframe python-3.x pandas

我有两个数据帧(df1和df2),每个数据帧具有相同的行和列.我想逐个元素地取这两个数据帧的最大值.此外,任何元素最大值与数字和NaN的结果应该是数字.到目前为止我实施的方法似乎效率低下:

def element_max(df1,df2):
    import pandas as pd
    cond = df1 >= df2
    res = pd.DataFrame(index=df1.index, columns=df1.columns)
    res[(df1==df1)&(df2==df2)&(cond)]  = df1[(df1==df1)&(df2==df2)&(cond)]
    res[(df1==df1)&(df2==df2)&(~cond)] = df2[(df1==df1)&(df2==df2)&(~cond)]
    res[(df1==df1)&(df2!=df2)&(~cond)] = df1[(df1==df1)&(df2!=df2)]
    res[(df1!=df1)&(df2==df2)&(~cond)] = df2[(df1!=df1)&(df2==df2)]
    return res
Run Code Online (Sandbox Code Playgroud)

还有其他想法吗?感谢您的时间.

EdC*_*ica 12

你可以where用来测试你的df与另一个df,其中条件是True,df返回值,当返回false时返回值df1.此外,在NaN值为的情况下,df1附加调用fillna(df)将使用from df来填充它们NaN并返回所需的df:

In [178]:
df = pd.DataFrame(np.random.randn(5,3))
df.iloc[1,2] = np.NaN
print(df)
df1 = pd.DataFrame(np.random.randn(5,3))
df1.iloc[0,0] = np.NaN
print(df1)

          0         1         2
0  2.671118  1.412880  1.666041
1 -0.281660  1.187589       NaN
2 -0.067425  0.850808  1.461418
3 -0.447670  0.307405  1.038676
4 -0.130232 -0.171420  1.192321
          0         1         2
0       NaN -0.244273 -1.963712
1 -0.043011 -1.588891  0.784695
2  1.094911  0.894044 -0.320710
3 -1.537153  0.558547 -0.317115
4 -1.713988 -0.736463 -1.030797

In [179]:
df.where(df > df1, df1).fillna(df)

Out[179]:
          0         1         2
0  2.671118  1.412880  1.666041
1 -0.043011  1.187589  0.784695
2  1.094911  0.894044  1.461418
3 -0.447670  0.558547  1.038676
4 -0.130232 -0.171420  1.192321
Run Code Online (Sandbox Code Playgroud)


And*_*nes 10

在最新版本的熊猫中执行此操作的一种更具可读性的方法是concat-and-max:

import scipy as sp
import pandas as pd

A = pd.DataFrame([[1., 2., 3.]])
B = pd.DataFrame([[3., sp.nan, 1.]])

pd.concat([A, B]).max(level=0)
# 
#           0    1    2
#      0  3.0  2.0  3.0 
#
Run Code Online (Sandbox Code Playgroud)