Dav*_*542 1 sql google-bigquery
我在 BigQuery 中执行以下操作:
SELECT ARRAY_AGG(state IGNORE NULLS LIMIT 10000)
FROM mytable
GROUP BY state
Run Code Online (Sandbox Code Playgroud)
将结果限制为不大于 1MB 的最佳方法是什么?以前,我在 ARRAY_AGG 中执行了 LIMIT,但如果存在较大的文本字段,通常会超出限制,因此我更愿意通过最终结果大小来限制它。
选项之一(BigQuery 标准 SQL)
#standardSQL
WITH temp AS (
SELECT state, SUM(LENGTH(state)) OVER(ORDER BY pos) size
FROM (
SELECT state, ROW_NUMBER() OVER() pos
FROM `project.dataset.table`
)
)
SELECT ARRAY_AGG(state IGNORE NULLS)
FROM temp
WHERE size < 1000000
Run Code Online (Sandbox Code Playgroud)
您可以使用下面的虚拟示例来测试、玩上面的游戏:
#standardSQL
WITH `project.dataset.table` AS (
SELECT REPEAT('a', CAST(100 * RAND() AS INT64)) state
FROM UNNEST(GENERATE_ARRAY(1, 100))
), temp AS (
SELECT state, SUM(LENGTH(state)) OVER(ORDER BY pos) size
FROM (
SELECT state, ROW_NUMBER() OVER() pos
FROM `project.dataset.table`
)
)
SELECT ARRAY_AGG(state IGNORE NULLS)
FROM temp
WHERE size < 5000
Run Code Online (Sandbox Code Playgroud)