我面临的问题是如何拆分多值列,即List[String]分成不同的行.
初始数据集具有以下类型: Dataset[(Integer, String, Double, scala.List[String])]
+---+--------------------+-------+--------------------+
| id| text | value | properties |
+---+--------------------+-------+--------------------+
| 0|Lorem ipsum dolor...| 1.0|[prp1, prp2, prp3..]|
| 1|Lorem ipsum dolor...| 2.0|[prp4, prp5, prp6..]|
| 2|Lorem ipsum dolor...| 3.0|[prp7, prp8, prp9..]|
Run Code Online (Sandbox Code Playgroud)
生成的数据集应具有以下类型:
Dataset[(Integer, String, Double, String)]
Run Code Online (Sandbox Code Playgroud)
而properties应拆分这样的:
+---+--------------------+-------+--------------------+
| id| text | value | property |
+---+--------------------+-------+--------------------+
| 0|Lorem ipsum dolor...| 1.0| prp1 |
| 0|Lorem ipsum dolor...| 1.0| prp2 |
| 0|Lorem ipsum dolor...| 1.0| prp3 |
| 1|Lorem …Run Code Online (Sandbox Code Playgroud)