小编kee*_*007的帖子

使用 PySpark 分解数组值

我是 pyspark 的新手,我需要以将每个值分配给一个新列的方式分解我的值数组。我尝试使用爆炸,但无法获得所需的输出。以下是我的输出

+---------------+----------+------------------+----------+---------+------------+--------------------+
|account_balance|account_id|credit_Card_Number|first_name|last_name|phone_number|        transactions|
+---------------+----------+------------------+----------+---------+------------+--------------------+
|         100000|     12345|             12345|       abc|      xyz|  1234567890|[1000, 01/06/2020...|
|         100000|     12345|             12345|       abc|      xyz|  1234567890|[1100, 02/06/2020...|
|         100000|     12345|             12345|       abc|      xyz|  1234567890|[6146, 02/06/2020...|
|         100000|     12345|             12345|       abc|      xyz|  1234567890|[253, 03/06/2020,...|
|         100000|     12345|             12345|       abc|      xyz|  1234567890|[4521, 04/06/2020...|
|         100000|     12345|             12345|       abc|      xyz|  1234567890|[955, 05/06/2020,...|
+---------------+----------+------------------+----------+---------+------------+--------------------+
Run Code Online (Sandbox Code Playgroud)

下面是程序的架构

root
 |-- account_balance: long (nullable = true)
 |-- account_id: long (nullable = true)
 |-- credit_Card_Number: long (nullable = true)
 |-- first_name: string (nullable …
Run Code Online (Sandbox Code Playgroud)

hadoop apache-spark apache-spark-sql pyspark

4
推荐指数
1
解决办法
2274
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1

hadoop ×1

pyspark ×1