这里有一个关于此问题的问题:
假设我们有额外的列,如下所示:
**userId someString varA varB varC varD**
1 "example1" [0,2,5] [1,2,9] [a,b,c] [red,green,yellow]
2 "example2" [1,20,5] [9,null,6] [d,e,f] [white,black,cyan]
Run Code Online (Sandbox Code Playgroud)
总结如下输出:
userId someString varA varB varC varD
1 "example1" 0 1 a red
1 "example1" 2 2 b green
1 "example1" 5 9 c yellow
2 "example2" 1 9 d white
2 "example2" 20 null e black
2 "example2" 5 6 f Cyan
Run Code Online (Sandbox Code Playgroud)
答案是通过将a定义udf为:
val zip = udf((xs: Seq[Long], ys: Seq[Long]) => xs.zip(ys))
Run Code Online (Sandbox Code Playgroud)
并定义“ withColumn”。
df.withColumn("vars", …Run Code Online (Sandbox Code Playgroud) 我在MySQL表中使用GROUP_CONCAT遇到麻烦,g0如下所示:
ID Age Sex
-------------
1 16 Male
2 18 Female
3 16 Male
4 18 Female
5 16 Male
Run Code Online (Sandbox Code Playgroud)
但是我需要桌子看起来像
ID count
1,3,5 3
2,4 2
Run Code Online (Sandbox Code Playgroud)
我试过这个查询:
ID Age Sex
-------------
1 16 Male
2 18 Female
3 16 Male
4 18 Female
5 16 Male
Run Code Online (Sandbox Code Playgroud)
但我收到此错误消息:
1248. Every derived table must have it's own alias
Run Code Online (Sandbox Code Playgroud)