Rob*_*tti 10 python merge python-2.7 python-3.x pandas
我目前正在使用外连接合并2个数据帧,但在合并之后,我看到所有行都是重复的,即使我合并的列包含相同的值.详细地:
list_1 = pd.read_csv('list_1.csv')
list_2 = pd.read_csv('list_2.csv')
merged_list = pd.merge(list_1 , list_2 , on=['email_address'], how='inner')
Run Code Online (Sandbox Code Playgroud)
具有以下输入和结果:
LIST_1:
email_address, name, surname
john.smith@email.com, john, smith
john.smith@email.com, john, smith
elvis@email.com, elvis, presley
Run Code Online (Sandbox Code Playgroud)
list_2:
email_address, street, city
john.smith@email.com, street1, NY
john.smith@email.com, street1, NY
elvis@email.com, street2, LA
Run Code Online (Sandbox Code Playgroud)
merged_list:
email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA
elvis@email.com, elvis, presley, street2, LA
Run Code Online (Sandbox Code Playgroud)
我的问题是,不应该这样吗?
merged_list(我希望如何:D):
email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA
Run Code Online (Sandbox Code Playgroud)
我怎样才能让它变成这样?非常感谢你的帮助!
piR*_*red 20
list_2_nodups = list_2.drop_duplicates()
pd.merge(list_1 , list_2_nodups , on=['email_address'])
Run Code Online (Sandbox Code Playgroud)

期望重复的行.每个约翰史密斯都list_1与每个约翰史密斯相匹配list_2.我不得不将重复项删除到其中一个列表中.我选择了list_2.
Raf*_*ral 11
不要在合并之前删除重复项,而是在合并之后!
最好的解决方案是进行合并,然后删除重复项。
在你的情况下:
merged_df = pd.merge(df1, df2, on=['email_address'], how='inner')
merged_df.drop_duplicates(subset=['email_address'], keep='first', inplace=True, ignore_index=True)
Run Code Online (Sandbox Code Playgroud)