在新 BigQuery 标准 SQL 的数组中使用结构

Question

在新 BigQuery 标准 SQL 的数组中使用结构

我正在尝试使用新的标准 SQL 在 Google BigQuery 表中的结构数组中查找具有重复字段的行。表中的数据（简化）其中每一行看起来有点像这样：

{
  "Session": "abc123",
  "Information" [
    {
      "Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
    },
    {
      "Identifier": "1c62813f-7ec4-4968-b18b-d1eb8f4d9d26"
    },
    {
      "Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
    }
  ]
}

Run Code Online (Sandbox Code Playgroud)

我的最终目标是显示Information具有重复Identifier值的实体的行。但是，我尝试的大多数查询都会收到以下形式的错误消息：

Cannot access field Identifier on a value with type ARRAY<STRUCT<Identifier STRING>>

Run Code Online (Sandbox Code Playgroud)

有没有办法用的数据里面工作STRUCT中的ARRAY？

这是我第一次尝试查询：

SELECT
  Session,
  Information
FROM
  `events.myevents`
WHERE
  COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
LIMIT
  1000

Run Code Online (Sandbox Code Playgroud)

另一个使用子查询：

SELECT
  Session,
  Information
FROM (
  SELECT
    Session,
    Information,
    COUNT(DISTINCT Information.Identifier) AS info_count_distinct,
    ARRAY_LENGTH(Information) AS info_count
  FROM
    `events.myevents`
  WHERE
    COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
  LIMIT
    1000)
WHERE
  info_count != info_count_distinct

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mik*_*ant 5

试试下面

SELECT Session, Identifier, COUNT(1) AS dups
FROM `events.myevents`, UNNEST(Information)
GROUP BY Session, Identifier
HAVING dups > 1
ORDER BY Session

Run Code Online (Sandbox Code Playgroud)

应该给你你所期望的加上重复次数。
如下图（示例）

Session Identifier                              dups     
abc123  e8d971a4-ef33-4ea1-8627-f1213e4c67dc    2    
abc345  1c62813f-7ec4-4968-b18b-d1eb8f4d9d26    3

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，7 月前
查看次数：	14990 次
最近记录：	9 年，7 月前