我正在尝试连接两个PySpark数据帧和一些只在每个上面的列:
from pyspark.sql.functions import randn, rand
df_1 = sqlContext.range(0, 10)
+--+
|id|
+--+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+--+
df_2 = sqlContext.range(11, 20)
+--+
|id|
+--+
| 10|
| 11|
| 12|
| 13|
| 14|
| 15|
| 16|
| 17|
| 18|
| 19|
+--+
df_1 = df_1.select("id", rand(seed=10).alias("uniform"), randn(seed=27).alias("normal"))
df_2 = df_2.select("id", rand(seed=10).alias("uniform"), randn(seed=27).alias("normal_2"))
Run Code Online (Sandbox Code Playgroud)
现在我想生成第三个数据帧.我想要像熊猫这样的东西concat:
df_1.show()
+---+--------------------+--------------------+
| id| uniform| …Run Code Online (Sandbox Code Playgroud)