Tah*_*sha 11 python merge pandas
我有两个由groupby操作生成的数据帧(实际上是Series):
bw
l1
Consumer Discretionary 0.118718
Consumer Staples 0.089850
Energy 0.109988
Financials 0.159418
Health Care 0.115060
Industrials 0.109078
Information Technology 0.200392
Materials 0.035509
Telecommunications Services 0.030796
Utilities 0.031190
dtype: float64
Run Code Online (Sandbox Code Playgroud)
和 pw
l1
Consumer Discretionary 0.148655
Consumer Staples 0.067873
Energy 0.063899
Financials 0.095689
Health Care 0.116015
Industrials 0.181346
Information Technology 0.117715
Materials 0.043155
Telecommunications Services 0.009550
Utilities 0.156103
dtype: float64
Run Code Online (Sandbox Code Playgroud)
当我尝试和merge他们使用时
pd.merge(bw,pw,left_index=True,right_index=True)
我收到一个错误
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-174-739bb362e06d>", line 1, in <module>
pd.merge(pw,attr,left_index=True,right_index=True)
File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 39, in merge
return op.get_result()
File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 185, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 251, in _get_join_info
left_ax = self.left._data.axes[self.axis]
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)
但是当我这样做的时候
bw = bw.reset_index()
pw = pw.reset_index()
mrg = pd.merge(pw,bw,on="l1")
Run Code Online (Sandbox Code Playgroud)
有用.它使得我的代码在多次连接迭代中的可读性降低,但是我想知道我做错了什么以及如何让代码的第一个版本merging on indexes工作.
谢谢
Rob*_*Liu 15
将系列更改为DataFrame然后可以合并
merged = pd.merge(pd.DataFrame(bw),pd.DataFrame(pw),left_index=True,right_index=True)
print(merged)
Run Code Online (Sandbox Code Playgroud)
结果:
0_x 0_y
l1
Consumer Discretionary 0.118718 0.118718
Consumer Staples 0.089850 0.089850
Energy 0.109988 0.109988
Financials 0.159418 0.159418
Health Care 0.115060 0.115060
Industrials 0.109078 0.109078
Information Technology 0.200392 0.200392
Materials 0.035509 0.222509
Telecommunications Services 0.030796 0.030796
Utilities 0.031190 0.031190
Run Code Online (Sandbox Code Playgroud)
或者如果要以并行方式执行合并(bw和pw具有相同的索引,相同数量的项目).
c = zip(bw.tolist(),pw.tolist())
merged = pd.DataFrame(c, index=bw.index)
Run Code Online (Sandbox Code Playgroud)
应该有相同的结果.
当你reset_index()是一个系列时,它会变成一个DataFrame(索引到列).这就是为什么你可以在那之后合并.