Geo*_*eRF 5 python apache-spark apache-spark-sql pyspark
df1
uid1 var1
0 John 3
1 Paul 4
2 George 5
Run Code Online (Sandbox Code Playgroud)
df2
uid1 var2
0 John 23
1 Paul 44
2 George 52
Run Code Online (Sandbox Code Playgroud)
df3
uid1 var3
0 John 31
1 Paul 45
2 George 53
Run Code Online (Sandbox Code Playgroud)
df_lst=[df1,df2,df3]
Run Code Online (Sandbox Code Playgroud)
如何基于公共密钥uid1合并/加入列表中的3个数据帧?
编辑:预期输出
df1
uid1 var1 var2 var3
0 John 3 23 31
1 Paul 4 44 45
2 George 5 52 53
Run Code Online (Sandbox Code Playgroud)
您可以加入数据框列表.下面是一个简单的例子
import spark.implicits._
val df1 = spark.sparkContext.parallelize(Seq(
(0,"John",3),
(1,"Paul",4),
(2,"George",5)
)).toDF("id", "uid1", "var1")
import spark.implicits._
val df2 = spark.sparkContext.parallelize(Seq(
(0,"John",23),
(1,"Paul",44),
(2,"George",52)
)).toDF("id", "uid1", "var2")
import spark.implicits._
val df3 = spark.sparkContext.parallelize(Seq(
(0,"John",31),
(1,"Paul",45),
(2,"George",53)
)).toDF("id", "uid1", "var3")
val df = List(df1, df2, df3)
df.reduce((a,b) => a.join(b, Seq("id", "uid1")))
Run Code Online (Sandbox Code Playgroud)
输出:
+---+------+----+----+----+
| id| uid1|var1|var2|var3|
+---+------+----+----+----+
| 1| Paul| 4| 44| 45|
| 2|George| 5| 52| 53|
| 0| John| 3| 23| 31|
+---+------+----+----+----+
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助!
| 归档时间: |
|
| 查看次数: |
9373 次 |
| 最近记录: |