Lec*_*ico 13 apache-spark apache-spark-sql
我有一个包含以下内容的数据框:
movieId / movieName / genre
1 example1 action|thriller|romance
2 example2 fantastic|action
Run Code Online (Sandbox Code Playgroud)
我想获得第二个数据帧(来自第一个),其中包含以下内容:
movieId / movieName / genre
1 example1 action
1 example1 thriller
1 example1 romance
2 example2 fantastic
2 example2 action
Run Code Online (Sandbox Code Playgroud)
我怎么能这样做?
Jac*_*ski 25
我使用split标准功能.
scala> movies.show(truncate = false)
+-------+---------+-----------------------+
|movieId|movieName|genre |
+-------+---------+-----------------------+
|1 |example1 |action|thriller|romance|
|2 |example2 |fantastic|action |
+-------+---------+-----------------------+
scala> movies.withColumn("genre", explode(split($"genre", "[|]"))).show
+-------+---------+---------+
|movieId|movieName| genre|
+-------+---------+---------+
| 1| example1| action|
| 1| example1| thriller|
| 1| example1| romance|
| 2| example2|fantastic|
| 2| example2| action|
+-------+---------+---------+
// You can use \\| for split instead
scala> movies.withColumn("genre", explode(split($"genre", "\\|"))).show
+-------+---------+---------+
|movieId|movieName| genre|
+-------+---------+---------+
| 1| example1| action|
| 1| example1| thriller|
| 1| example1| romance|
| 2| example2|fantastic|
| 2| example2| action|
+-------+---------+---------+
Run Code Online (Sandbox Code Playgroud)
ps你可以Dataset.flatMap用来实现相同的结果,这是Scala开发人员会更喜欢的东西.
| 归档时间: |
|
| 查看次数: |
9433 次 |
| 最近记录: |