run*_*i74 4 python dataframe pandas
我有2个dataFrames并希望比较它们并返回第一个(df1)中不在第二个(df2)中的行.我找到了一种方法来比较它们并返回差异,但无法弄清楚如何从df1只返回丢失的那些.
import pandas as pd
from pandas import Series, DataFrame
df1 = pd.DataFrame( {
"City" : ["Chicago", "San Franciso", "Boston"] ,
"State" : ["Illinois", "California", "Massachusett"] } )
df2 = pd.DataFrame( {
"City" : ["Chicago", "Mmmmiami", "Dallas" , "Omaha"] ,
"State" : ["Illinois", "Florida", "Texas", "Nebraska"] } )
df = pd.concat([df1, df2])
df = df.reset_index(drop=True)
df_gpby = df.groupby(list(df.columns))
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
blah = df.reindex(idx)
Run Code Online (Sandbox Code Playgroud)
以@ EdChum的建议为基础:
df = pd.merge(df1, df2, how='outer', suffixes=('','_y'), indicator=True)
rows_in_df1_not_in_df2 = df[df['_merge']=='left_only'][df1.columns]
rows_in_df1_not_in_df2
|Index |City |State |
|------|------------|------------|
|1 |San Franciso|California |
|2 |Boston |Massachusett|
Run Code Online (Sandbox Code Playgroud)
IIUC然后如果您使用的是熊猫版本,0.17.0则可以使用merge和设置indicator=True:
In [80]:
df1 = pd.DataFrame( {
"City" : ["Chicago", "San Franciso", "Boston"] ,
"State" : ["Illinois", "California", "Massachusett"] } )
?
df2 = pd.DataFrame( {
"City" : ["Chicago", "Mmmmiami", "Dallas" , "Omaha"] ,
"State" : ["Illinois", "Florida", "Texas", "Nebraska"] } )
pd.merge(df1,df2, how='outer', indicator=True)
Out[80]:
City State _merge
0 Chicago Illinois both
1 San Franciso California left_only
2 Boston Massachusett left_only
3 Mmmmiami Florida right_only
4 Dallas Texas right_only
5 Omaha Nebraska right_only
Run Code Online (Sandbox Code Playgroud)
这将添加一列以指示行仅在lhs还是rhs中出现
| 归档时间: |
|
| 查看次数: |
6112 次 |
| 最近记录: |