R. *_*iel 1 scala apache-spark
我是新手,我需要帮助解决这个问题.
我有一个像这样的csv文件:
ANI,2974483123 29744423747 293744450542,Twitter,@ani
Run Code Online (Sandbox Code Playgroud)
我需要拆分第二列"2974483123 29744423747 293744450542"并创建3行,如下所示:
ANI,2974483123,Twitter,@ani
ANI,29744423747,Twitter,@ani
ANI,293744450542,Twitter,@ani
Run Code Online (Sandbox Code Playgroud)
有人能帮我吗?请!
flatMap 正是你要找的:
val input: RDD[String] = sc.parallelize(Seq("ANI,2974483123 29744423747 293744450542,Twitter,@ani"))
val csv: RDD[Array[String]] = input.map(_.split(','))
val result = csv.flatMap { case Array(s1, s2, s3, s4) => s2.split(" ").map(part => (s1, part, s3, s4)) }
Run Code Online (Sandbox Code Playgroud)