Jih*_* No 5 dataframe apache-spark apache-spark-sql
Spark Scala中是否没有功能级别的grouping_sets支持?
我不知道这个补丁适用于大师版 https://github.com/apache/spark/pull/5080
我想通过scala dataframe api进行这种查询。
GROUP BY expression list GROUPING SETS(expression list2)
Run Code Online (Sandbox Code Playgroud)
cube和rollup 功能在Dataset API中可用,但找不到分组集。为什么?
我想通过 scala 数据框 api 进行这种查询。
tl;dr到 Spark 2.1.0,这是不可能的。目前没有计划将此类运算符添加到 Dataset API。
Spark SQL 支持以下所谓的多维聚合运算符:
rollup 操作员cube 操作员GROUPING SETS 子句(仅在 SQL 模式下)grouping()和grouping_id()功能注意:GROUPING SETS仅在 SQL 模式下可用。数据集 API 中不支持。
val sales = Seq(
("Warsaw", 2016, 100),
("Warsaw", 2017, 200),
("Boston", 2015, 50),
("Boston", 2016, 150),
("Toronto", 2017, 50)
).toDF("city", "year", "amount")
sales.createOrReplaceTempView("sales")
// equivalent to rollup("city", "year")
val q = sql("""
SELECT city, year, sum(amount) as amount
FROM sales
GROUP BY city, year
GROUPING SETS ((city, year), (city), ())
ORDER BY city DESC NULLS LAST, year ASC NULLS LAST
""")
scala> q.show
+-------+----+------+
| city|year|amount|
+-------+----+------+
| Warsaw|2016| 100|
| Warsaw|2017| 200|
| Warsaw|null| 300|
|Toronto|2017| 50|
|Toronto|null| 50|
| Boston|2015| 50|
| Boston|2016| 150|
| Boston|null| 200|
| null|null| 550| <-- grand total across all cities and years
+-------+----+------+
// equivalent to cube("city", "year")
// note the additional (year) grouping set
val q = sql("""
SELECT city, year, sum(amount) as amount
FROM sales
GROUP BY city, year
GROUPING SETS ((city, year), (city), (year), ())
ORDER BY city DESC NULLS LAST, year ASC NULLS LAST
""")
scala> q.show
+-------+----+------+
| city|year|amount|
+-------+----+------+
| Warsaw|2016| 100|
| Warsaw|2017| 200|
| Warsaw|null| 300|
|Toronto|2017| 50|
|Toronto|null| 50|
| Boston|2015| 50|
| Boston|2016| 150|
| Boston|null| 200|
| null|2015| 50| <-- total across all cities in 2015
| null|2016| 250| <-- total across all cities in 2016
| null|2017| 250| <-- total across all cities in 2017
| null|null| 550|
+-------+----+------+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2948 次 |
| 最近记录: |