Fra*_*ane 5 python merge dataframe pandas
用熊猫的洞察力,
我试图用另一个数据帧更新一个简单的数据帧,但遇到了麻烦。我有一个要更新的主数据框:
主_df:
color tastey
name
Apples Red Always
Avocados Black Sometimes
Anise Brown NaN
Run Code Online (Sandbox Code Playgroud)
我有一些新数据,我想用它来更新这个数据框。它可能会追加新列、添加新行或更新旧值:
新建_df:
color tastey price
name
Bananas Yellow NaN Medium
Apples Red Usually Low
Berries Red NaN High
Run Code Online (Sandbox Code Playgroud)
我想合并这两个数据帧,以便更新的数据帧如下所示:
期望_df:
color tastey price
name
Apples Red Always Low
Avocados Black Sometimes NaN
Anise Brown NaN NaN
Bananas Yellow NaN Medium
Berries Red NaN High
Run Code Online (Sandbox Code Playgroud)
我玩过许多不同的命令,但我仍然在努力:
最后,(虽然在这个例子中没有显示)我需要加入多个列。即我需要使用 3 列来形成我的唯一键。(尽管我确信上述示例的解决方案会扩展到这种情况。)
我真诚地感谢任何帮助或指点!我希望上面的例子是清楚的。
干杯,
熊猫针头。
编辑1:我相信这个问题与以前提出的问题不同,因为当我使用时,combine_first我得到了这个:
>>> Master_df.combine_first(New_df)
color tastey
name
Apples Red Always
Avocados Black Sometimes
Anise Brown NaN
Run Code Online (Sandbox Code Playgroud)
Edit2:好的,我越来越近了,但还没有!我不想生成_x和_y列。我希望它们成为一列,从New_df发生冲突时获取数据。
>>> updated = pd.merge(Master_df, New_df, how="outer", on=["name"])
name color_x tastey_x color_y tastey_y price
0 Apples Red Always Red Usually Low
1 Avocados Black Sometimes NaN NaN NaN
2 Anise Brown NaN NaN NaN NaN
3 Bananas NaN NaN Yellow NaN Medium
4 Berries NaN NaN Red NaN High
Run Code Online (Sandbox Code Playgroud)
Edit3:这是我正在尝试做的事情的图像。重要的是,我不必对键以外的列名称(“A”、“B”等)进行硬编码。
PS代码如下。
import pandas as pd
import numpy as np
Master_data = {
'name' : ['Apples', 'Avocados', 'Anise'],
'color' : ['Red', 'Black', 'Brown'],
'tastey' : ['Always', 'Sometimes', np.NaN]
}
Master_df = pd.DataFrame(Master_data, columns = ['name', 'color', 'tastey'])
Master_df = Master_df.set_index('name')
print(Master_df)
newData = {
'name' : ['Bananas', 'Apples', 'Berries'],
'color' : ['Yellow', 'Red', 'Red'],
'tastey' : [np.NaN, 'Usually', np.NaN],
'price' : ['Medium', 'Low', 'High']
}
New_df = pd.DataFrame(newData, columns = ['name', 'color', 'tastey', 'price'])
New_df = New_df.set_index('name')
print(New_df)
Desired_data = {
'name' : ['Apples', 'Avocados', 'Anise', 'Bananas', 'Berries'],
'color' : ['Red', 'Black', 'Brown', 'Yellow', 'Red'],
'tastey' : ['Always', 'Sometimes', np.NaN, np.NaN, np.NaN],
'price' : ['Low', np.NaN, np.NaN, 'Medium', 'High']
}
Desired_df = pd.DataFrame(Desired_data, columns = ['name', 'color', 'tastey', 'price'])
Desired_df = Desired_df.set_index('name')
print(Desired_df)
Run Code Online (Sandbox Code Playgroud)
您可以在之前使用pd.DataFrame.update(就地操作): pd.DataFrame.combine_first
New_df.update(Master_df)
res = New_df.combine_first(Master_df)
# color price tastey
# name
# Anise Brown NaN NaN
# Apples Red Low Always
# Avocados Black NaN Sometimes
# Bananas Yellow Medium NaN
# Berries Red High NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
893 次 |
| 最近记录: |