我有一些数据框需要求和,但其中一些缺少列。不幸的是,结果将具有这些列的 NaN 值,而某些输入数据帧中缺少这些值。
如何保留这些列的原始值?
这是一个小代码:
#!/usr/bin/env ipython
# ---------------------
import pandas as pd
import numpy as np
import datetime
# ----------------------------------------
N=10
years = [vv for vv in range(2010,2010+N)]
# generate data:
data_a = {'years':years,'A':np.random.random(N),'B':np.random.random(N)}
data_b = {'years':years,'A':np.random.random(N),'C':np.random.random(N)}
# ----------------------------------------
dfa = pd.DataFrame.from_dict(data_a);dfa = dfa.set_index('years')
dfb = pd.DataFrame.from_dict(data_b);dfb = dfb.set_index('years')
dfc = dfa + dfb
# ----------------------------------------
Run Code Online (Sandbox Code Playgroud)
而不是将 dfc 设为:
A B C
years
2010 0.830207 NaN NaN
2011 1.237387 NaN NaN
2012 1.386908 NaN NaN
2013 0.949136 NaN NaN
2014 0.897436 NaN NaN
2015 0.375644 NaN NaN
2016 1.134836 NaN NaN
2017 1.125501 NaN NaN
2018 1.140183 NaN NaN
2019 0.522178 NaN NaN
Run Code Online (Sandbox Code Playgroud)
我想从 dfa 获得 B 列的原始值,从 dfb 获得 C 列的原始值。
由于实际表很大,因此优选一些自动解决方案。
dfc = dfa.add(dfb, fill_value=0)
print (dfc)
A B C
years
2010 0.986393 0.020584 0.607545
2011 1.090208 0.969910 0.170524
2012 1.024139 0.832443 0.065052
2013 0.965020 0.212339 0.948886
2014 0.612089 0.181825 0.965632
2015 0.941170 0.183405 0.808397
2016 0.257757 0.304242 0.304614
2017 1.380411 0.524756 0.097672
2018 1.193530 0.431945 0.684233
2019 0.754523 0.291229 0.440152
Run Code Online (Sandbox Code Playgroud)