Hun*_* A. 1 python google-bigquery google-cloud-platform
因此,我正在编写一个 BigQuery 查询,基本上只需要能够检查多个字符串中的任何一个是否作为表的某一列中的元素存在,其中关心的列本身包含字符串数组。仅供参考,我正在将查询编写为小型自动化 Python 作业的一部分,并使用标准 SQL。
我在这里找不到任何可以明确检查数组包含的内容:https ://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators
所以我想出了一个使用非常hacky正则表达式的解决方案,具体来说:
...other query stuff...
WHERE
REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")
Run Code Online (Sandbox Code Playgroud)
... column
表中我关心的列是哪里,joined_string
是一个长字符串,由我需要检查连接的所有字符串组成|
(其中|
用作正则表达式 OR 运算符)。
BigQuery 标准 SQL 中是否存在某种内置功能,可以让您更明智地执行此操作?
下面是两个例子。
首先假设您的字符串位于另一个表中 strings
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
SELECT 2, ['123', '456', '789'] UNION ALL
SELECT 3, ['135', '246', '369']
),
strings AS (
SELECT 'abc' AS str UNION ALL
SELECT '123' UNION ALL
SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0
Run Code Online (Sandbox Code Playgroud)
SELECT
如果您需要查看有多少字符串匹配, 您可以添加以下内容到列表中
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt
Run Code Online (Sandbox Code Playgroud)
第二个示例假设您有包装在数组中的字符串列表
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
SELECT 2, ['123', '456', '789'] UNION ALL
SELECT 3, ['135', '246', '369']
),
strings AS (
SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0
Run Code Online (Sandbox Code Playgroud)
与第一个示例相同 - 您可以将以下内容添加到SELECT
列表中以查看匹配数
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt
Run Code Online (Sandbox Code Playgroud)