小编zia*_*ida的帖子

使用pyspark,如何在保留其他列的同时将包含变量映射的列扩展到DataFrame中的新列?

我有以下数据帧

+-----+--------------------------------------------------+---+
|asset|signals                                           |ts |
+-----+--------------------------------------------------+---+
|2    |[D -> 1100, F -> 3000]                            |6  |
|1    |[D -> 500, System.Date -> 340]                    |5  |
|1    |[B -> 100, E -> 900, System.Date -> 310]          |4  |
|1    |[B -> 110, C -> 200, System.Date -> 320]          |3  |
|1    |[A -> 330, B -> 120, C -> 210, D -> 410, E -> 100]|2  |
+-----+--------------------------------------------------+---+
Run Code Online (Sandbox Code Playgroud)

我需要将具有键值的 column:'signals' 投影到多个列,如下所示:

+-----+---+-----------+----+----+----+----+----+----+
|asset|ts |System.Date|F   |E   |B   |D   |C   |A   |
+-----+---+-----------+----+----+----+----+----+----+
|2 …
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql pyspark

5
推荐指数
1
解决办法
195
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1

pyspark ×1