0ni*_*nir 2 python nan subtraction dataframe pandas
我想在pandas DataFrame中找到2列int类型之间的区别.我正在使用python 2.7.栏目如下 -
>>> df
INVOICED_QUANTITY QUANTITY_SHIPPED
0 15 NaN
1 20 NaN
2 7 NaN
3 7 NaN
4 7 NaN
Run Code Online (Sandbox Code Playgroud)
现在,我想从INVOICED_QUANTITY中减去QUANTITY_SHIPPED并执行以下操作 -
>>> df['Diff'] = df['QUANTITY_INVOICED'] - df['SHIPPED_QUANTITY']
>>> df
QUANTITY_INVOICED SHIPPED_QUANTITY Diff
0 15 NaN NaN
1 20 NaN NaN
2 7 NaN NaN
3 7 NaN NaN
4 7 NaN NaN
Run Code Online (Sandbox Code Playgroud)
我该如何照顾NaN?我希望得到以下结果,因为我希望NaN被视为0(零) -
>>> df
QUANTITY_INVOICED SHIPPED_QUANTITY Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
Run Code Online (Sandbox Code Playgroud)
我不想做df.fillna(0).总之,我会尝试类似下面的东西&它的工作但不是差异 -
>>> df['Sum'] = df[['QUANTITY_INVOICED', 'SHIPPED_QUANTITY']].sum(axis=1)
>>> df
INVOICED_QUANTITY QUANTITY_SHIPPED Diff Sum
0 15 NaN NaN 15
1 20 NaN NaN 20
2 7 NaN NaN 7
3 7 NaN NaN 7
4 7 NaN NaN 7
Run Code Online (Sandbox Code Playgroud)
您可以使用该sub方法执行减法 - 此方法允许将NaN值视为指定值:
df['Diff'] = df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
Run Code Online (Sandbox Code Playgroud)
哪个产生:
INVOICED_QUANTITY QUANTITY_SHIPPED Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
Run Code Online (Sandbox Code Playgroud)
另一种巧妙的方法是@JianxunLi建议:填写列中的缺失值(创建列的副本)并正常减去.
这两种方法几乎相同,但sub效率稍高,因为它不需要提前生成列的副本; 它只是"在飞行中"填充缺失的值:
In [46]: %timeit df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
10000 loops, best of 3: 144 µs per loop
In [47]: %timeit df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
10000 loops, best of 3: 81.7 µs per loop
Run Code Online (Sandbox Code Playgroud)
我认为简单地用 0 填充 NaN 会帮助你。
df['Diff'] = df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
Out[153]:
INVOICED_QUANTITY QUANTITY_SHIPPED Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2135 次 |
| 最近记录: |