Flu*_*uxy 3 python pyspark pyspark-dataframes
我有以下 PySpark DataFrame:
id col1 col2
A 2 3
A 2 4
A 4 6
B 1 2
Run Code Online (Sandbox Code Playgroud)
我想堆叠col1并col2获得如下单列:
id col3
A 2
A 3
A 4
A 6
B 1
B 2
Run Code Online (Sandbox Code Playgroud)
我怎么能这样做?
df = (
sc.parallelize([
(A, 2, 3), (A, 2, 4), (A, 4, 6),
(B, 1, 2),
]).toDF(["id", "col1", "col2"])
)
Run Code Online (Sandbox Code Playgroud)
The simplest is merge col1 and col2 into an array column and then explode it:
df.show()
+---+----+----+
| id|col1|col2|
+---+----+----+
| A| 2| 3|
| A| 2| 4|
| A| 4| 6|
| B| 1| 2|
+---+----+----+
df.selectExpr('id', 'explode(array(col1, col2))').show()
+---+---+
| id|col|
+---+---+
| A| 2|
| A| 3|
| A| 2|
| A| 4|
| A| 4|
| A| 6|
| B| 1|
| B| 2|
+---+---+
Run Code Online (Sandbox Code Playgroud)
You can drop duplicates if you don't need them.
| 归档时间: |
|
| 查看次数: |
2460 次 |
| 最近记录: |