Bigquery 中的 STRING_AGG

fra*_*ina 9 sql google-bigquery

我在 Bigquery 中遇到 STRING_AGG 问题。我想:

SELECT
 id,
 institution,
 COUNT(DISTINCT institution)  OVER (PARTITION BY id) as count_intitution
 STRING_AGG(DISTINCT institution,"," )  OVER (PARTITION BY id) as list_intitution
FROM
 name_table
WHERE
 DATE(created_at) = "2020-02-02"
Run Code Online (Sandbox Code Playgroud)

我收到此错误:

解析函数string_agg不支持DISTINCT。

BQ 文档说它允许使用“DISTINCT”

https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg

但显然它不支持“partition by”,为什么?

编辑:

当前表是这样的(这是一个例子,表有更多的属性)

|id |institution|
|1  | a         |
|1  | b         |
|2  | a         |
|2  | c         |
|3  | a         |
|1  | a         |
Run Code Online (Sandbox Code Playgroud)

我想要实现的是

|id|count_institution|list_institution|
|1 |2                |a,b             |
|2 |2                |a,c             |
|3 |1                |a               |
Run Code Online (Sandbox Code Playgroud)

Mik*_*ant 6

以下是 BigQuery 标准 SQL

#standardSQL
SELECT * 
  REPLACE((
      SELECT STRING_AGG(DISTINCT i) FROM t.list_intitution i
    ) AS list_intitution
  ) 
FROM (
  SELECT
   id,
   institution,
   COUNT(DISTINCT institution)  OVER (PARTITION BY id) AS count_intitution,
   ARRAY_AGG(institution) OVER (PARTITION BY id) AS list_intitution
  FROM
   name_table
  WHERE
   DATE(created_at) = "2020-02-02"
) t  
Run Code Online (Sandbox Code Playgroud)

注意:在原始查询中,您只需删除 DISTINCT 并使用 ARRAY_AGG 而不是 STRING_AGG,但随后在外部查询中,您处理此数组以形成该数组中不同值的列表

以下是对您更新的问题的回答

您可以简单地使用 GROUP BY,如下例所示

#standardSQL
SELECT id, 
  COUNT(DISTINCT institution) AS count_institution,
  STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
Run Code Online (Sandbox Code Playgroud)

如果适用于您问题中的示例数据,如下例所示

#standardSQL
WITH name_table AS (
  SELECT 1 id, 'a' institution UNION ALL
  SELECT 1, 'b' UNION ALL
  SELECT 2, 'a' UNION ALL
  SELECT 2, 'c' UNION ALL
  SELECT 3, 'a' UNION ALL
  SELECT 1, 'a' 
)
SELECT id, 
  COUNT(DISTINCT institution) AS count_institution,
  STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
Run Code Online (Sandbox Code Playgroud)

结果是

Row id  count_institution   list_institution     
1   1   2                   a,b  
2   2   2                   a,c  
3   3   1                   a    
Run Code Online (Sandbox Code Playgroud)