Ric*_*ard 3 sql google-bigquery
我在 BigQuery 工作。我有一个表t1,其中包含地址、邮政编码、价格和日期字段。我想按地址和邮政编码对其进行分组,找到每个地址的最新行的价格。
如何在 BigQuery 中执行此操作?我知道如何获取地址、邮政编码和最近的日期:
SELECT
ADDRESS, POSTCODE, MAX(DATE)
FROM
[mytable]
GROUP BY
ADDRESS,
POSTCODE
Run Code Online (Sandbox Code Playgroud)
但我不知道如何获得与这些字段匹配的这些行的价格。这是我最好的猜测,它确实产生了结果 - 这是否正确?
SELECT
t1.address, t1.postcode, t1.date, t2.price
FROM [mytable] t2
JOIN
(SELECT
ADDRESS, POSTCODE, MAX(DATE) AS date
FROM
[mytable]
GROUP BY
ADDRESS,
POSTCODE) t1
ON t1.address=t2.address
AND t1.postcode=t2.postcode
AND t1.date=t2.date
Run Code Online (Sandbox Code Playgroud)
在我看来,这应该可行,但一些类似问题的解决方案要复杂得多。
只需使用row_number():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ADDRESS, POSTCODE
ORDER BY DATE DESC
) as seqnum
FROM [mytable] t
) t
WHERE seqnum = 1;
Run Code Online (Sandbox Code Playgroud)
这不是聚合查询。您想过滤行以获取最新值。
请尝试以下 BigQuery 标准 SQL
#standardSQL
SELECT row.* FROM (
SELECT ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)] AS row
FROM `yourTable` AS t
GROUP BY address, postcode
)
Run Code Online (Sandbox Code Playgroud)
您可以使用以下虚拟数据来播放/测试它
#standardSQL
WITH yourTable AS (
SELECT 'address_1' AS address, 'postcode_1' AS postcode, '2017-01-01' AS date, 1 AS price UNION ALL
SELECT 'address_1', 'postcode_1', '2017-01-02', 2 UNION ALL
SELECT 'address_1', 'postcode_1', '2017-01-03', 3 UNION ALL
SELECT 'address_1', 'postcode_1', '2017-01-04', 4 UNION ALL
SELECT 'address_2', 'postcode_2', '2017-01-01', 5 UNION ALL
SELECT 'address_3', 'postcode_1', '2017-01-01', 6 UNION ALL
SELECT 'address_3', 'postcode_1', '2017-01-02', 7 UNION ALL
SELECT 'address_3', 'postcode_1', '2017-01-03', 8
)
SELECT row.* FROM (
SELECT ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)] AS row
FROM `yourTable` AS t
GROUP BY address, postcode
)
Run Code Online (Sandbox Code Playgroud)