小编AJD*_*JDF的帖子

Spark 中的横向视图/分解多列,获取重复项

我有以下数据框,其中一些列包含数组。(我们使用的是spark 1.6)

+--------------------+--------------+------------------+--------------+--------------------+-------------+
|            UserName|     col1     |    col2          |col3          |col4                |col5         |
+--------------------+--------------+------------------+--------------+--------------------+-------------+
|foo                 |[Main, Indi...|[1777203, 1777203]|    [GBP, GBP]|            [CR, CR]|   [143, 143]|
+--------------------+--------------+------------------+--------------+--------------------+-------------+
Run Code Online (Sandbox Code Playgroud)

我期望以下结果:

+--------------------+--------------+------------------+--------------+--------------------+-------------+
|            UserName|     explod   |    explod2       |explod3       |explod4             |explod5      |
+--------------------+--------------+------------------+--------------+--------------------+-------------+
|NNNNNNNNNNNNNNNNN...|      Main    |1777203           |    GBP      |     CR              |    143      |
|NNNNNNNNNNNNNNNNN...|Individual    |1777203           |    GBP      |     CR              |    143      |
----------------------------------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

我尝试过横向视图:

sqlContext.sql("SELECT `UserName`, explod, explod2, explod3, explod4, explod5 FROM sourceDF
LATERAL VIEW explode(`col1`) sourceDF AS explod 
LATERAL VIEW explode(`col2`) explod AS explod2 
LATERAL VIEW …
Run Code Online (Sandbox Code Playgroud)

xml hadoop hive scala apache-spark

2
推荐指数
1
解决办法
8808
查看次数

标签 统计

apache-spark ×1

hadoop ×1

hive ×1

scala ×1

xml ×1