我刚刚发现了 BigQuery\xe2\x80\x99s QUALIFY 运算符,并在https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#qualify_clause上阅读了相关内容
\n但该文档没有解释为什么我应该使用 QUALIFY 而不是普通的 WHERE 谓词。如果我们采用文档中提供的示例:
\nSELECT item,\n\xc2\xa0 RANK() OVER (PARTITION BY category ORDER BY purchases DESC) as rank\nFROM Produce\nWHERE Produce.category = \'vegetable\'\n QUALIFY rank <= 3\nRun Code Online (Sandbox Code Playgroud)\n该查询也可以写成
\nSELECT\n\xc2\xa0 item,\n\xc2\xa0 RANK() OVER (PARTITION BY category ORDER BY purchases DESC) as rank\nFROM Produce\nWHERE Produce.category = \'vegetable\'\nAND rank <= 3\nRun Code Online (Sandbox Code Playgroud)\n它会产生相同的结果。那么使用QUALIFY有什么好处呢?
\nJih*_*hoi 10
该子句的一种用法QUALIFY是通过分析函数(有时使用 WINDOW FUNCTION)过滤结果。正如您在评论中提到的,这可以被视为语法糖,因为分析函数的结果可以存储在 an 中additional subquery,并且可以用WHERE子句进行过滤。
SELECT user_id, ip, country_code, os, ...,
FROM login_logs
WHERE TRUE
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY log_datetime DESC) = 1
;
Run Code Online (Sandbox Code Playgroud)
#standardSQL
WITH
login_log AS (
SELECT
user_id, ip, country_code, os, ...,
ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY log_datetime DESC
) AS row_num
FROM user_login_info_table
)
SELECT user_id, ip, country_code, os, ...
FROM login_log
WHERE row_num = 1
;
Run Code Online (Sandbox Code Playgroud)
我刚刚用我的数据测试了两种不同的方法,发现这slot time两个查询几乎相似。然而,该QUALIFY子句在使用上有一点优势shuffled byte,因为它不需要保留row_num列的结果。
SELECT
log_datetime, user_id, os,
LAG(os, 1, NULL) OVER user_id_os_list as previous_os,
FROM login_logs
WHERE TRUE
QUALIFY (previous_os != os)
WINDOW user_id_os_list AS (
PARTITION BY user_id ORDER BY log_datetime
)
;
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6035 次 |
| 最近记录: |