在 pyspark 数据框中的第一个序号位置添加一个新列

Question

在 pyspark 数据框中的第一个序号位置添加一个新列

PRA*_*PTA 5 python apache-spark apache-spark-sql pyspark

我有一个 pyspark 数据框，如：

+--------+-------+-------+
| col1   | col2  | col3  |
+--------+-------+-------+
|  25    |  01   |     2 |
|  23    |  12   |     5 | 
|  11    |  22   |     8 |
+--------+-------+-------+

Run Code Online (Sandbox Code Playgroud)

我想通过添加这样的新列来创建新的数据框：

+--------------+-------+-------+-------+
| new_column   | col1  | col2  | col3  |
+--------------+-------+-------+-------+
|  0           |  01   |     2 |  0    |
|  0           |  12   |     5 |  0    |
|  0           |  22   |     8 |  0    |
+--------------+-------+-------+-------+

Run Code Online (Sandbox Code Playgroud)

我知道我可以通过以下方式添加列：

df.withColumn("new_column", lit(0))

Run Code Online (Sandbox Code Playgroud)

但它最后像这样添加了列：

+--------------+-------+-------+-------------+
| col1         | col1  | col2  | new_column  |
+--------------+-------+-------+-------------+
|  25          |  01   |     2 |  0          |
|  23          |  12   |     5 |  0          |
|  11          |  22   |     8 |  0          |
+--------------+-------+-------+-------------+

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ter*_*rry 5

您可以使用 select 对列重新排序。

df = df.select('new_column','col1','col2','col3')
df.show()

Run Code Online (Sandbox Code Playgroud)

Answer 2

pau*_*ult 5

您随时可以重新排序使用火花数据帧中的列select，如图这篇文章。

在这种情况下，您还可以使用select和一步实现所需的输出，alias如下所示：

df = df.select(lit(0).alias("new_column"), "*")

Run Code Online (Sandbox Code Playgroud)

这在逻辑上等同于以下 SQL 代码：

SELECT 0 AS new_column, * FROM df

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，1 月前
查看次数：	6336 次
最近记录：	6 年，6 月前