fra*_*ina 9 sql google-bigquery
我在 Bigquery 中遇到 STRING_AGG 问题。我想:
SELECT
id,
institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) as count_intitution
STRING_AGG(DISTINCT institution,"," ) OVER (PARTITION BY id) as list_intitution
FROM
name_table
WHERE
DATE(created_at) = "2020-02-02"
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
解析函数string_agg不支持DISTINCT。
BQ 文档说它允许使用“DISTINCT”
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg
但显然它不支持“partition by”,为什么?
编辑:
当前表是这样的(这是一个例子,表有更多的属性)
|id |institution|
|1 | a |
|1 | b |
|2 | a |
|2 | c |
|3 | a |
|1 | a |
Run Code Online (Sandbox Code Playgroud)
我想要实现的是
|id|count_institution|list_institution|
|1 |2 |a,b |
|2 |2 |a,c |
|3 |1 |a |
Run Code Online (Sandbox Code Playgroud)
以下是 BigQuery 标准 SQL
#standardSQL
SELECT *
REPLACE((
SELECT STRING_AGG(DISTINCT i) FROM t.list_intitution i
) AS list_intitution
)
FROM (
SELECT
id,
institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) AS count_intitution,
ARRAY_AGG(institution) OVER (PARTITION BY id) AS list_intitution
FROM
name_table
WHERE
DATE(created_at) = "2020-02-02"
) t
Run Code Online (Sandbox Code Playgroud)
注意:在原始查询中,您只需删除 DISTINCT 并使用 ARRAY_AGG 而不是 STRING_AGG,但随后在外部查询中,您处理此数组以形成该数组中不同值的列表
以下是对您更新的问题的回答
您可以简单地使用 GROUP BY,如下例所示
#standardSQL
SELECT id,
COUNT(DISTINCT institution) AS count_institution,
STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
Run Code Online (Sandbox Code Playgroud)
如果适用于您问题中的示例数据,如下例所示
#standardSQL
WITH name_table AS (
SELECT 1 id, 'a' institution UNION ALL
SELECT 1, 'b' UNION ALL
SELECT 2, 'a' UNION ALL
SELECT 2, 'c' UNION ALL
SELECT 3, 'a' UNION ALL
SELECT 1, 'a'
)
SELECT id,
COUNT(DISTINCT institution) AS count_institution,
STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
Run Code Online (Sandbox Code Playgroud)
结果是
Row id count_institution list_institution
1 1 2 a,b
2 2 2 a,c
3 3 1 a
Run Code Online (Sandbox Code Playgroud)