使用pandas找出2列与Null之间的差异

0ni*_*nir 2 python nan subtraction dataframe pandas

我想在pandas DataFrame中找到2列int类型之间的区别.我正在使用python 2.7.栏目如下 -

>>> df
   INVOICED_QUANTITY  QUANTITY_SHIPPED
0                 15               NaN
1                 20               NaN
2                  7               NaN
3                  7               NaN
4                  7               NaN
Run Code Online (Sandbox Code Playgroud)

现在,我想从INVOICED_QUANTITY中减去QUANTITY_SHIPPED并执行以下操作 -

>>> df['Diff'] = df['QUANTITY_INVOICED'] - df['SHIPPED_QUANTITY']
>>> df
   QUANTITY_INVOICED  SHIPPED_QUANTITY  Diff
0                 15               NaN   NaN
1                 20               NaN   NaN
2                  7               NaN   NaN
3                  7               NaN   NaN
4                  7               NaN   NaN
Run Code Online (Sandbox Code Playgroud)

我该如何照顾NaN?我希望得到以下结果,因为我希望NaN被视为0(零) -

>>> df
       QUANTITY_INVOICED  SHIPPED_QUANTITY  Diff
    0                 15               NaN   15
    1                 20               NaN   20
    2                  7               NaN   7
    3                  7               NaN   7
    4                  7               NaN   7
Run Code Online (Sandbox Code Playgroud)

我不想做df.fillna(0).总之,我会尝试类似下面的东西&它的工作但不是差异 -

>>> df['Sum'] = df[['QUANTITY_INVOICED', 'SHIPPED_QUANTITY']].sum(axis=1)
>>> df
   INVOICED_QUANTITY  QUANTITY_SHIPPED  Diff  Sum
0                 15               NaN   NaN   15
1                 20               NaN   NaN   20
2                  7               NaN   NaN    7
3                  7               NaN   NaN    7
4                  7               NaN   NaN    7
Run Code Online (Sandbox Code Playgroud)

Ale*_*ley 6

您可以使用该sub方法执行减法 - 此方法允许将NaN值视为指定值:

df['Diff'] = df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
Run Code Online (Sandbox Code Playgroud)

哪个产生:

   INVOICED_QUANTITY  QUANTITY_SHIPPED  Diff
0                 15               NaN    15
1                 20               NaN    20
2                  7               NaN     7
3                  7               NaN     7
4                  7               NaN     7
Run Code Online (Sandbox Code Playgroud)

另一种巧妙的方法是@JianxunLi建议:填写列中的缺失值(创建列的副本)并正常减去.

这两种方法几乎相同,但sub效率稍高,因为它不需要提前生成列的副本; 它只是"在飞行中"填充缺失的值:

In [46]: %timeit df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
10000 loops, best of 3: 144 µs per loop

In [47]: %timeit df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
10000 loops, best of 3: 81.7 µs per loop
Run Code Online (Sandbox Code Playgroud)


Jia*_* Li 5

我认为简单地用 0 填充 NaN 会帮助你。

df['Diff'] = df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)

Out[153]: 
   INVOICED_QUANTITY  QUANTITY_SHIPPED  Diff
0                 15               NaN    15
1                 20               NaN    20
2                  7               NaN     7
3                  7               NaN     7
4                  7               NaN     7
Run Code Online (Sandbox Code Playgroud)