BigQuery Standard 分组时获取第一个非空值

Mar*_* M. 7 coalesce google-bigquery

我有一个这样的表:

CUSTOMERS_ID  DATE_SALES  DIMENSION
MARIO1        20200201    NULL
MARIO1        20200113    Spain
MARIO2        20200131    NULL
MARIO3        20200101    France
MARIO3        20191231    Spain
Run Code Online (Sandbox Code Playgroud)

我需要按CUSTOMERS_IDDATE_SALES DESC 字段订购。然后我想按CUSTOMERS_ID字段进行分组并获取DIMENSION字段的第一个非空值。输出表将是:

CUSTOMERS_ID  DIMENSION
MARIO1        Spain
MARIO2        NULL
MARIO3        France
Run Code Online (Sandbox Code Playgroud)

有任何想法吗?我已经尝试过COALESCE功能,FIRST_VALUE但没有得到我预期的结果。

提前致谢!

Sab*_*Sab 9

您可以按客户 ID 进行分组并通过忽略 NULL 来使用 ARRAY_AGG,还可以按该字段中的日期进行排序。限制 1 将通过使用更少的 RAM 存储来提高效率。然后,OFFSET(0) 将使其成为非嵌套字段,因此您可以轻松使用该字段。

WITH 
raw_data AS
(
  SELECT 'MARIO1' CUSTOMERS_ID, 20200201 DATE_SALES, NULL as DIMENSION UNION ALL
  SELECT 'MARIO1' CUSTOMERS_ID, 20200113 DATE_SALES, 'Spain' as DIMENSION UNION ALL
  SELECT 'MARIO2' CUSTOMERS_ID, 20200131 DATE_SALES, NULL as DIMENSION UNION ALL
  SELECT 'MARIO3' CUSTOMERS_ID, 20200101 DATE_SALES, 'France' as DIMENSION UNION ALL
  SELECT 'MARIO3' CUSTOMERS_ID, 20191231 DATE_SALES, 'Spain' as DIMENSION
)
SELECT CUSTOMERS_ID, ARRAY_AGG(DIMENSION IGNORE NULLS ORDER BY DATE_SALES DESC LIMIT 1)[OFFSET(0)] as DIMENSION
FROM raw_data
GROUP BY 1
Run Code Online (Sandbox Code Playgroud)


Mik*_*ant 3

以下是 BigQuery 标准 SQL

#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY IF(DIMENSION IS NULL, NULL, DATE_SALES) DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY CUSTOMERS_ID   
Run Code Online (Sandbox Code Playgroud)

如果适用于您问题中的样本数据 - 结果是

Row CUSTOMERS_ID    DATE_SALES  DIMENSION    
1   MARIO1          20200113    Spain    
2   MARIO2          20200131    null     
3   MARIO3          20200101    France   
Run Code Online (Sandbox Code Playgroud)