cs9*_*s95 271 python merge join pandas
LEFT
| RIGHT
| FULL
)(INNER
| OUTER
)连接?merge
?join
?concat
?update
?谁?什么?为什么?!... 和更多.我已经看到了这些反复出现的问题,询问了pandas合并功能的各个方面.今天关于合并及其各种用例的大部分信息在几十个措辞严厉,不可搜索的帖子中都是分散的.这里的目的是为后代整理一些更重要的观点.
这个QnA应该是关于常见熊猫习语的一系列有用的用户指南的下一部分(参见关于转动的这篇文章,以及关于连接的这篇文章,我将在稍后介绍).
请注意,这篇文章并不是文档的替代品,所以请阅读它!一些例子来自那里.
cs9*_*s95 375
这篇文章旨在为读者提供关于SQL风格的大熊猫合并,如何使用以及何时不使用它的入门知识.
特别是,这篇文章将通过以下内容:
基础知识 - 连接类型(LEFT,RIGHT,OUTER,INNER)
merge
和join
这篇文章不会经历的内容:
注意
除非另有说明,否则大多数示例默认为INNER JOIN操作,同时演示各种功能.此外,可以复制和复制此处的所有DataFrame,以便您可以使用它们.另外,请参阅此文章 ,了解如何从剪贴板中读取DataFrame.
最后,通过文章https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins借鉴了JOIN操作的所有可视化表示 .
merge
!建立
np.random.seed(0)
left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
left
key value
0 A 1.764052
1 B 0.400157
2 C 0.978738
3 D 2.240893
right
key value
0 B 1.867558
1 D -0.977278
2 E 0.950088
3 F -0.151357
Run Code Online (Sandbox Code Playgroud)
为简单起见,键列具有相同的名称(暂时).
一个内连接由下式表示
注意
pd.merge
这里指的是从连接列键left
数据帧,right
是指从联接列键merge
数据框,并且交集代表共同向这两个键DataFrame.merge
和how='left'
.阴影区域表示JOIN结果中存在的键.整个过程都将遵循这一惯例.请记住,维恩图并不是JOIN操作的100%准确表示,因此请用一点盐来处理它们.
要执行INNER JOIN,请调用how='left'
指定左侧DataFrame,右侧DataFrame和连接键.
pd.merge(left, right, on='key')
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
Run Code Online (Sandbox Code Playgroud)
这仅返回来自left
和right
共享公共密钥的行(在此示例中为"B"和"D").
在更新版本的pandas(v0.21左右)中,how='right'
现在是第一个订单功能,所以你可以调用right
.
left.merge(right, on='key')
# Or, if you want to be explicit
# left.merge(right, on='key', how='inner')
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
Run Code Online (Sandbox Code Playgroud)
甲LEFT OUTER JOIN,或LEFT JOIN由下式表示
这可以通过指定来执行left
.
left.merge(right, on='key', how='left')
key value_x value_y
0 A 1.764052 NaN
1 B 0.400157 1.867558
2 C 0.978738 NaN
3 D 2.240893 -0.977278
Run Code Online (Sandbox Code Playgroud)
仔细注意NaNs的位置.如果指定how='outer'
,则仅使用密钥left
,并且缺少的数据left
将由NaN替换.
同样地,对于一个正确的外部联接,或者正确的联合......
...指定keyLeft
:
left.merge(right, on='key', how='right')
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
2 E NaN 0.950088
3 F NaN -0.151357
Run Code Online (Sandbox Code Playgroud)
这里right
使用密钥,并且缺少的数据keyRight
由NaN替换.
最后,对于FULL OUTER JOIN,给出
指定key
.
left.merge(right, on='key', how='outer')
key value_x value_y
0 A 1.764052 NaN
1 B 0.400157 1.867558
2 C 0.978738 NaN
3 D 2.240893 -0.977278
4 E NaN 0.950088
5 F NaN -0.151357
Run Code Online (Sandbox Code Playgroud)
This uses the keys from both frames, and NaNs are inserted for missing rows in both.
The documentation summarises these various merges nicely:
If you need LEFT-Excluding JOINs and RIGHT-Excluding JOINs in two steps.
For LEFT-Excluding JOIN, represented as
Start by performing a LEFT OUTER JOIN and then filtering (excluding!) rows coming from left_on
only,
(left.merge(right, on='key', how='left', indicator=True)
.query('_merge == "left_only"')
.drop('_merge', 1))
key value_x value_y
0 A 1.764052 NaN
2 C 0.978738 NaN
Run Code Online (Sandbox Code Playgroud)
Where,
left.merge(right, on='key', how='left', indicator=True)
key value_x value_y _merge
0 A 1.764052 NaN left_only
1 B 0.400157 1.867558 both
2 C 0.978738 NaN left_only
3 D 2.240893 -0.977278 both
Run Code Online (Sandbox Code Playgroud)
And similarly, for a RIGHT-Excluding JOIN,
(left.merge(right, on='key', how='right', indicator=True)
.query('_merge == "right_only"')
.drop('_merge', 1))
key value_x value_y
2 E NaN 0.950088
3 F NaN -0.151357
Run Code Online (Sandbox Code Playgroud)
Lastly, if you are required to do a merge that only retains keys from the left or right, but not both (IOW, performing an ANTI-JOIN),
You can do this in similar fashion—
(left.merge(right, on='key', how='outer', indicator=True)
.query('_merge != "both"')
.drop('_merge', 1))
key value_x value_y
0 A 1.764052 NaN
2 C 0.978738 NaN
4 E NaN 0.950088
5 F NaN -0.151357
Run Code Online (Sandbox Code Playgroud)
If the key columns are named differently—for example, right_on
has on
, and keyLeft
has left
instead of keyRight
—then you will have to specify right
and keyLeft
as arguments instead of keyRight
:
left2 = left.rename({'key':'keyLeft'}, axis=1)
right2 = right.rename({'key':'keyRight'}, axis=1)
left2
keyLeft value
0 A 1.764052
1 B 0.400157
2 C 0.978738
3 D 2.240893
right2
keyRight value
0 B 1.867558
1 D -0.977278
2 E 0.950088
3 F -0.151357
Run Code Online (Sandbox Code Playgroud)
left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner')
keyLeft value_x keyRight value_y
0 B 0.400157 B 1.867558
1 D 2.240893 D -0.977278
Run Code Online (Sandbox Code Playgroud)
When merging on left2.merge(right2, left_on='keyLeft', right_on='keyRight', how='inner')
from keyLeft
and DataFrames
from map
, if you only want either of the on
or left_on
(but not both) in the output, you can start by setting the index as a preliminary step.
left3 = left2.set_index('keyLeft')
left3.merge(right2, left_index=True, right_on='keyRight')
value_x keyRight value_y
0 0.400157 B 1.867558
1 2.240893 D -0.977278
Run Code Online (Sandbox Code Playgroud)
Contrast this with the output of the command just before (thst is, the output of right_on
), you'll notice merge*
is missing. You can figure out what column to keep based on which frame's index is set as the key. This may matter when, say, performing some OUTER JOIN operation.
merge
For example, consider
right3 = right.assign(newcol=np.arange(len(right)))
right3
key value newcol
0 B 1.867558 0
1 D -0.977278 1
2 E 0.950088 2
3 F -0.151357 3
Run Code Online (Sandbox Code Playgroud)
If you are required to merge only "new_val" (without any of the other columns), you can usually just subset columns before merging:
left.merge(right3[['key', 'newcol']], on='key')
key value newcol
0 B 0.400157 0
1 D 2.240893 1
Run Code Online (Sandbox Code Playgroud)
If you're doing a LEFT OUTER JOIN, a more performant solution would involve DataFrame.update
:
# left['newcol'] = left['key'].map(right3.set_index('key')['newcol']))
left.assign(newcol=left['key'].map(right3.set_index('key')['newcol']))
key value newcol
0 A 1.764052 NaN
1 B 0.400157 0.0
2 C 0.978738 NaN
3 D 2.240893 1.0
Run Code Online (Sandbox Code Playgroud)
As mentioned, this is similar to, but faster than
left.merge(right3[['key', 'newcol']], on='key', how='left')
key value newcol
0 A 1.764052 NaN
1 B 0.400157 0.0
2 C 0.978738 NaN
3 D 2.240893 1.0
Run Code Online (Sandbox Code Playgroud)
To join on more than one column, specify a list for DataFrame.combine_first
(or pd.merge_ordered
and pd.merge_asof
, as appropriate).
left.merge(right, on=['key1', 'key2'] ...)
Run Code Online (Sandbox Code Playgroud)
Or, in the event the names are different,
left.merge(right, left_on=['lkey1', 'lkey2'], right_on=['rkey1', 'rkey2'])
Run Code Online (Sandbox Code Playgroud)
merge
operations and functionsBesides join
, concat
and merge
are also used in certain cases to update one DataFrame with another.
on
is a useful function for ordered JOINs.
left_on
(read: merge_asOf) is useful for approximate joins.
This section only covers the very basics, and is designed to only whet your appetite. For more examples and cases, see the documentation on right_on
, right_on=...
, and left_index=True
as well as the links to the function specs.
left
s)Setup
np.random.seed([3, 14])
left = pd.DataFrame({'value': np.random.randn(4)}, index=['A', 'B', 'C', 'D'])
right = pd.DataFrame({'value': np.random.randn(4)}, index=['B', 'D', 'E', 'F'])
left.index.name = right.index.name = 'idxkey'
left
value
idxkey
A -0.602923
B -0.402655
C 0.302329
D -0.524349
right
value
idxkey
B 0.543843
D 0.013135
E -0.326498
F 1.385076
Run Code Online (Sandbox Code Playgroud)
Typically, a merge on index would look like this:
left.merge(right, left_index=True, right_index=True)
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
If your index is named, then v0.23 users can also specify the level name to left_on
(or DataFrame.join
and DataFrame.join
as necessary).
left.merge(right, on='idxkey')
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
It is possible (and quite simple) to use the index of one, and the column of another, to perform a merge. For example,
left.merge(right, left_on='key1', right_index=True)
Run Code Online (Sandbox Code Playgroud)
Or vice versa (DataFrame.join
and how='inner'
).
right2 = right.reset_index().rename({'idxkey' : 'colkey'}, axis=1)
right2
colkey value
0 B 0.543843
1 D 0.013135
2 E -0.326498
3 F 1.385076
left.merge(right2, left_index=True, right_on='colkey')
value_x colkey value_y
0 -0.402655 B 0.543843
1 -0.524349 D 0.013135
Run Code Online (Sandbox Code Playgroud)
In this special case, the index for lsuffix
is named, so you can also use the index name with rsuffix
, like this:
left.merge(right2, left_on='idxkey', right_on='colkey')
value_x colkey value_y
0 -0.402655 B 0.543843
1 -0.524349 D 0.013135
Run Code Online (Sandbox Code Playgroud)
join
Besides these, there is another succinct option. You can use pd.concat
which defaults to joins on the index. pd.concat
does a LEFT OUTER JOIN by default, so join='inner'
is necessary here.
left.join(right, how='inner', lsuffix='_x', rsuffix='_y')
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
Note that I needed to specify the pd.concat
and merge
arguments since merge
would otherwise error out:
left.join(right)
ValueError: columns overlap but no suffix specified: Index(['value'], dtype='object')
Run Code Online (Sandbox Code Playgroud)
Since the column names are the same. This would not be a problem if they were differently named.
left.rename(columns={'value':'leftvalue'}).join(right, how='inner')
leftvalue value
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
pd.concat
Lastly, as an alternative for index-based joins, you can use DataFrame.join
:
pd.concat([left, right], axis=1, sort=False, join='inner')
value value
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
Omit pd.concat
if you need a FULL OUTER JOIN (the default):
pd.concat([left, right], axis=1, sort=False)
value value
A -0.602923 NaN
B -0.402655 0.543843
C 0.302329 NaN
D -0.524349 0.013135
E NaN -0.326498
F NaN 1.385076
Run Code Online (Sandbox Code Playgroud)
For more information, see this canonical post on pd.concat
by @piRSquared.
join='inner'
ing multiple DataFramesSetup
df1.merge(df2, ...).merge(df3, ...)
Run Code Online (Sandbox Code Playgroud)
Oftentimes, the situation arises when multiple DataFrames are to be merged together. Naively, this can be done by chaining join
calls:
# Setup.
np.random.seed(0)
A = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'valueA': np.random.randn(4)})
B = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'valueB': np.random.randn(4)})
C = pd.DataFrame({'key': ['D', 'E', 'J', 'C'], 'valueC': np.ones(4)})
dfs = [A, B, C]
# Note, the "key" column values are unique, so the index is unique.
A2 = A.set_index('key')
B2 = B.set_index('key')
C2 = C.set_index('key')
dfs2 = [A2, B2, C2]
Run Code Online (Sandbox Code Playgroud)
However, this quickly gets out of hand for many DataFrames. Furthermore, it may be necessary to generalise for an unknown number of DataFrames. To do this, one often used simple trick is with concat
, and you can use it to achieve a INNER JOIN like so:
# merge on `key` column, you'll need to set the index before concatenating
pd.concat([
df.set_index('key') for df in dfs], axis=1, join='inner'
).reset_index()
key valueA valueB valueC
0 D 2.240893 -0.977278 1.0
# merge on `key` index
pd.concat(dfs2, axis=1, sort=False, join='inner')
valueA valueB valueC
key
D 2.240893 -0.977278 1.0
Run Code Online (Sandbox Code Playgroud)
Note that every column besides the "key" column should be differently named for this to work out-of-box. Otherwise, you may need to use a join
.
For a FULL OUTER JOIN, you can curry join
using merge
:
A3 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'D'], 'valueA': np.random.randn(5)})
Run Code Online (Sandbox Code Playgroud)
您可能已经注意到,这非常强大 - 您还可以在合并期间使用它来控制列名.只需根据需要添加更多关键字参数:
pd.concat([df.set_index('key') for df in [A3, B, C]], axis=1, join='inner')
ValueError: Shape of passed values is (3, 4), indices imply (3, 2)
Run Code Online (Sandbox Code Playgroud)
替代方案:merge
如果您的列值是唯一的,那么使用它是有意义的join
,这比一次两次多路合并更快.
# join on `key` column, set as the index first
# For inner join. For left join, omit the "how" argument.
A.set_index('key').join(
[df.set_index('key') for df in (B, C)], how='inner').reset_index()
key valueA valueB valueC
0 D 2.240893 -0.977278 1.0
# join on `key` index
A3.set_index('key').join([B2, C2], how='inner')
valueA valueB valueC
key
D 1.454274 -0.977278 1.0
D 0.761038 -0.977278 1.0
Run Code Online (Sandbox Code Playgroud)
如果要在唯一索引上合并多个DataFrame,则应再次选择merge
更好的性能.
np.random.seed(0)
left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
left
key value
0 A 1.764052
1 B 0.400157
2 C 0.978738
3 D 2.240893
right
key value
0 B 1.867558
1 D -0.977278
2 E 0.950088
3 F -0.151357
Run Code Online (Sandbox Code Playgroud)
pd.merge(left, right, on='key')
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
Run Code Online (Sandbox Code Playgroud)
与往常一样,省略pd.merge
一个完整的外部联接.
left
很快,但有其缺点.它无法处理重复.
left.merge(right, on='key')
# Or, if you want to be explicit
# left.merge(right, on='key', how='inner')
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
Run Code Online (Sandbox Code Playgroud)
left.merge(right, on='key', how='left')
key value_x value_y
0 A 1.764052 NaN
1 B 0.400157 1.867558
2 C 0.978738 NaN
3 D 2.240893 -0.977278
Run Code Online (Sandbox Code Playgroud)
在这种情况下,right
是最好的选择,因为它可以处理非唯一索引(引擎盖下的merge
调用DataFrame.merge
).
left.merge(right, on='key', how='right')
key value_x value_y
0 B 0.400157 1.867558
1 D 2.240893 -0.977278
2 E NaN 0.950088
3 F NaN -0.151357
Run Code Online (Sandbox Code Playgroud)
eli*_*liu 33
一个补充的视觉观pd.concat([df0, df1], kwargs)
。请注意,kwarg axis=0
或axis=1
的含义不如df.mean()
或直观df.apply(func)
Gon*_*ica 18
在这个答案中,我将考虑实际示例。
第一个,是pandas.concat
。
第二个,从一个的索引和另一个的列合并数据帧。
考虑以下DataFrames
具有相同列名的内容:
Preco2018与大小 (8784, 5)
Preco 2019尺寸 (8760, 5)
具有相同的列名。
您可以使用pandas.concat
, 通过简单地组合它们
import pandas as pd
frames = [Preco2018, Preco2019]
df_merged = pd.concat(frames)
Run Code Online (Sandbox Code Playgroud)
这会产生具有以下大小的 DataFrame (17544, 5)
如果你想可视化,它最终会像这样工作
(来源)
2 . 按列和索引合并
在这一部分,我将考虑一个特定的情况:如果想要合并一个数据帧的索引和另一个数据帧的列。
假设有一个Geo
包含 54 列的数据框Data
,它是Date类型的列之一datetime64[ns]
。
并且Price
具有价格和索引的一列的数据框对应于日期
在这种特定情况下,要合并它们,可以使用 pd.merge
merged = pd.merge(Price, Geo, left_index=True, right_on='Data')
Run Code Online (Sandbox Code Playgroud)
这导致以下数据帧
cs9*_*s95 16
这篇文章将讨论以下主题:
merge
这里有缺点)通常,当多个 DataFrame 需要合并在一起时就会出现这种情况。天真地,这可以通过链接调用来完成merge
:
df1.merge(df2, ...).merge(df3, ...)
Run Code Online (Sandbox Code Playgroud)
然而,对于许多 DataFrame 来说,这很快就会失控。此外,可能需要对未知数量的数据帧进行泛化。
这里我介绍了针对唯一pd.concat
键的多路连接,以及针对非唯一键的多路连接。首先,设置。DataFrame.join
# Setup.
np.random.seed(0)
A = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'valueA': np.random.randn(4)})
B = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'valueB': np.random.randn(4)})
C = pd.DataFrame({'key': ['D', 'E', 'J', 'C'], 'valueC': np.ones(4)})
dfs = [A, B, C]
# Note: the "key" column values are unique, so the index is unique.
A2 = A.set_index('key')
B2 = B.set_index('key')
C2 = C.set_index('key')
dfs2 = [A2, B2, C2]
Run Code Online (Sandbox Code Playgroud)
如果您的键(此处的键可以是列或索引)是唯一的,那么您可以使用pd.concat
. 请注意,pd.concat
在索引上连接 DataFrame。
# Merge on `key` column. You'll need to set the index before concatenating
pd.concat(
[df.set_index('key') for df in dfs], axis=1, join='inner'
).reset_index()
key valueA valueB valueC
0 D 2.240893 -0.977278 1.0
# Merge on `key` index.
pd.concat(dfs2, axis=1, sort=False, join='inner')
valueA valueB valueC
key
D 2.240893 -0.977278 1.0
Run Code Online (Sandbox Code Playgroud)
省略join='inner'
FULL OUTER JOIN。请注意,您不能指定 LEFT 或 RIGHT OUTER 连接(如果您需要这些连接,请使用join
,如下所述)。
concat
速度快,但也有其缺点。它无法处理重复项。
A3 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'D'], 'valueA': np.random.randn(5)})
pd.concat([df.set_index('key') for df in [A3, B, C]], axis=1, join='inner')
Run Code Online (Sandbox Code Playgroud)
df1.merge(df2, ...).merge(df3, ...)
Run Code Online (Sandbox Code Playgroud)
在这种情况下,我们可以使用它join
,因为它可以处理非唯一键(请注意,join
在索引上连接 DataFrame;它merge
在幕后调用并执行 LEFT OUTER JOIN,除非另有指定)。
# Join on `key` column. Set as the index first.
# For inner join. For left join, omit the "how" argument.
A.set_index('key').join([B2, C2], how='inner').reset_index()
key valueA valueB valueC
0 D 2.240893 -0.977278 1.0
# Join on `key` index.
A3.set_index('key').join([B2, C2], how='inner')
valueA valueB valueC
key
D 1.454274 -0.977278 1.0
D 0.761038 -0.977278 1.0
Run Code Online (Sandbox Code Playgroud)
跳转到 Pandas Merging 101 中的其他主题继续学习:
* 你在这里
这篇文章将讨论以下主题:
merge
, join
,concat
有几个选项,根据用例的不同,有些选项比其他选项更简单。
DataFrame.merge
用left_index
和right_index
(或left_on
和right_on
使用名称索引)
- 支持内/左/右/全
- 一次只能加入两个
- 支持列-列、索引-列、索引-索引连接
DataFrame.join
(加入索引)
- 支持内/左(默认)/右/全
- 一次可以加入多个DataFrames
- 支持索引索引连接
pd.concat
(在索引上连接)
- 支持内部/完整(默认)
- 一次可以加入多个DataFrames
- 支持索引索引连接
设置和基础
import pandas as pd
import numpy as np
np.random.seed([3, 14])
left = pd.DataFrame(data={'value': np.random.randn(4)},
index=['A', 'B', 'C', 'D'])
right = pd.DataFrame(data={'value': np.random.randn(4)},
index=['B', 'D', 'E', 'F'])
left.index.name = right.index.name = 'idxkey'
left
value
idxkey
A -0.602923
B -0.402655
C 0.302329
D -0.524349
right
value
idxkey
B 0.543843
D 0.013135
E -0.326498
F 1.385076
Run Code Online (Sandbox Code Playgroud)
通常,索引的内部连接如下所示:
left.merge(right, left_index=True, right_index=True)
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
其他连接遵循类似的语法。
值得注意的替代品
DataFrame.join
默认为索引上的连接。DataFrame.join
默认情况下执行 LEFT OUTER JOIN,所以how='inner'
这里是必要的。
left.join(right, how='inner', lsuffix='_x', rsuffix='_y')
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
请注意,我需要指定lsuffix
和rsuffix
参数join
,否则会出错:
left.join(right)
ValueError: columns overlap but no suffix specified: Index(['value'], dtype='object')
Run Code Online (Sandbox Code Playgroud)
由于列名相同。如果它们的名称不同,这将不是问题。
left.rename(columns={'value':'leftvalue'}).join(right, how='inner')
leftvalue value
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
pd.concat
join 索引,并且可以一次连接两个或多个 DataFrame。默认情况下,它执行完整的外部联接,因此how='inner'
此处需要..
pd.concat([left, right], axis=1, sort=False, join='inner')
value value
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
有关更多信息concat
,请参阅此帖子。
为了进行内部联接使用的右左,列的索引,你将使用DataFrame.merge
的组合left_index=True
和right_on=...
。
right2 = right.reset_index().rename({'idxkey' : 'colkey'}, axis=1)
right2
colkey value
0 B 0.543843
1 D 0.013135
2 E -0.326498
3 F 1.385076
left.merge(right2, left_index=True, right_on='colkey')
value_x colkey value_y
0 -0.402655 B 0.543843
1 -0.524349 D 0.013135
Run Code Online (Sandbox Code Playgroud)
其他联接遵循类似的结构。请注意,只能merge
执行索引到列连接。您可以连接多个列,前提是左侧的索引级别数等于右侧的列数。
join
并且concat
不能进行混合合并。您需要使用DataFrame.set_index
.
如果您的索引已命名,则从 pandas >= 0.23 开始,DataFrame.merge
您可以将索引名称指定为on
(或left_on
和right_on
根据需要)。
left.merge(right, on='idxkey')
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Run Code Online (Sandbox Code Playgroud)
对于前面与左索引、右列合并的示例,您可以left_on
与左索引名称一起使用:
left.merge(right2, left_on='idxkey', right_on='colkey')
value_x colkey value_y
0 -0.402655 B 0.543843
1 -0.524349 D 0.013135
Run Code Online (Sandbox Code Playgroud)
跳转到 Pandas Merging 101 中的其他主题以继续学习:
* 你在这里
归档时间: |
|
查看次数: |
40946 次 |
最近记录: |