我有2个数据框,如下所示。
df_1
Index Fruit
1 Apple
2 Banana
3 Peach
df_2
Fruit Taste
Apple Tasty
Banana Tasty
Banana Rotten
Peach Rotten
Peach Tasty
Peach Tasty
Run Code Online (Sandbox Code Playgroud)
我想基于两个dataframes合并Fruit,但只保留第一次出现Apple,Banana以及Peach在第二数据帧。最终结果应为:
df_output
Index Fruit Taste
1 Apple Tasty
2 Banana Tasty
3 Peach Rotten
Run Code Online (Sandbox Code Playgroud)
其中Fruit,Index和Taste是列标题。我尝试了类似的方法,df1.merge(df2,how='left',on='Fruit但是它基于df_2
谢谢。
使用drop_duplicates的第一行:
df = df_1.merge(df_2.drop_duplicates('Fruit'),how='left',on='Fruit')
print (df)
Index Fruit Taste
0 1 Apple Tasty
1 2 Banana Tasty
2 3 Peach Rotten
Run Code Online (Sandbox Code Playgroud)
如果要只快添加一列,请使用map:
s = df_2.drop_duplicates('Fruit').set_index('Fruit')['Taste']
df_1['Taste'] = df_1['Fruit'].map(s)
print (df_1)
Index Fruit Taste
0 1 Apple Tasty
1 2 Banana Tasty
2 3 Peach Rotten
Run Code Online (Sandbox Code Playgroud)