我有一个包含三列的熊猫数据框:
a b c
Donaldson Minnesota 2020
Ozuna Atlanta 2020
Betts Boston 2019
Donaldson Atlanta 2019
Ozuna St. Louis 2019
Torres New York 2019
Run Code Online (Sandbox Code Playgroud)
我想识别具有多个列 c 值的所有列名称,然后将所有列 b 实例替换为数据框中的第一个值,如下所示:
a b c
Donaldson Minnesota 2020
Ozuna Atlanta 2020
Betts Boston 2019
Donaldson Minnesota 2019
Ozuna Atlanta 2019
Torres New York 2019
Run Code Online (Sandbox Code Playgroud)
这绝对是低效的,但这是我迄今为止尝试过的:
# get a df of just names and cities and deduplicate
df_names = df[['a','b']].drop_duplicates()
# find any multiple column b values and put them in a list
a_matches …Run Code Online (Sandbox Code Playgroud)