Eda*_*ame 10 python apache-spark pyspark spark-dataframe
我有以下两个数据框:
DF1:
Id | field_A | field_B | field_C | field_D
1 | cat | 12 | black | 11
2 | dog | 128 | white | 19
3 | dog | 35 | yellow | 20
4 | dog | 21 | brown | 4
5 | bird | 10 | blue | 7
6 | cow | 99 | brown | 34
Run Code Online (Sandbox Code Playgroud)
和
DF2:
Id | field_B | field_C | field_D | field_E
3 | 35 | yellow | 20 | 123
5 | 10 | blue | 7 | 454
6 | 99 | brown | 34 | 398
Run Code Online (Sandbox Code Playgroud)
我希望得到new_DF
Id | field_A | field_B | field_C | field_D | field_E
1 | cat | 12 | black | 11 |
2 | dog | 128 | white | 19 |
3 | dog | 35 | yellow | 20 | 123
4 | dog | 21 | brown | 4 |
5 | bird | 10 | blue | 7 | 454
6 | cow | 99 | brown | 34 | 398
Run Code Online (Sandbox Code Playgroud)
这可以通过数据框操作来实现吗?谢谢!
Max*_*axU 16
试试这个:
new_df = df1.join(df2, on=['field_B', 'field_C', 'field_D'], how='left_outer')
Run Code Online (Sandbox Code Playgroud)