BigQuery 检查数组重叠

Hun*_* A. 1 python google-bigquery google-cloud-platform

因此,我正在编写一个 BigQuery 查询,基本上只需要能够检查多个字符串中的任何一个是否作为表的某一列中的元素存在,其中关心的列本身包含字符串数组。仅供参考,我正在将查询编写为小型自动化 Python 作业的一部分,并使用标准 SQL。

我在这里找不到任何可以明确检查数组包含的内容:https ://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

所以我想出了一个使用非常hacky正则表达式的解决方案,具体来说:

...other query stuff...

WHERE
    REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")
Run Code Online (Sandbox Code Playgroud)

... column表中我关心的列是哪里,joined_string是一个长字符串,由我需要检查连接的所有字符串组成|(其中|用作正则表达式 OR 运算符)。

BigQuery 标准 SQL 中是否存在某种内置功能,可以让您更明智地执行此操作?

Mik*_*ant 5

下面是两个例子。

首先假设您的字符串位于另一个表中 strings

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT 'abc' AS str UNION ALL
  SELECT '123' UNION ALL
  SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0  
Run Code Online (Sandbox Code Playgroud)

SELECT如果您需要查看有多少字符串匹配, 您可以添加以下内容到列表中

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt
Run Code Online (Sandbox Code Playgroud)

第二个示例假设您有包装在数组中的字符串列表

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0   
Run Code Online (Sandbox Code Playgroud)

与第一个示例相同 - 您可以将以下内容添加到SELECT列表中以查看匹配数

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt
Run Code Online (Sandbox Code Playgroud)