如何在BigQuery Standard SQL中获取数组的一部分?

a p*_*erd 1 google-bigquery

在BigQuery中,我有一个带有这样的path列的表:

ID .     | Path
---------+----------------------------------------
1        | foo/bar/baz
2        | foo/bar/quux/blat
Run Code Online (Sandbox Code Playgroud)

我希望能够在正斜杠(/)上分割路径并选择一个或多个路径部分,然后重新加入它们。

在PostgreSQL中,这很容易:

select array_to_string((regexp_split_to_array(path, '/'))[1:3], '/')
Run Code Online (Sandbox Code Playgroud)

但是BigQuery似乎没有任何类型的范围偏移量或数组切片功能。

Mik*_*ant 6

以下是BigQuery标准SQL

#standardSQL
SELECT id, path,
  (
    SELECT STRING_AGG(part, '/' ORDER BY index) 
    FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index 
    WHERE index BETWEEN 1 AND 3
  ) adjusted_path
FROM `project.dataset.table`  
Run Code Online (Sandbox Code Playgroud)

您可以使用问题中的示例数据来测试,玩游戏,如以下示例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
  SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2' 
)
SELECT id, path,
  (
    SELECT STRING_AGG(part, '/' ORDER BY index) 
    FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index 
    WHERE index BETWEEN 1 AND 3
  ) adjusted_path
FROM `project.dataset.table`   
Run Code Online (Sandbox Code Playgroud)

结果

Row     id      path                                        adjusted_path    
1       1       foo/bar/baz/foo1/bar1/baz1/                 bar/baz/foo1     
2       2       foo/bar/quux/blat/foo2/bar2/quux2/blat2     bar/quux/blat    
Run Code Online (Sandbox Code Playgroud)

如果出于某种原因您想保持查询与PostgreSQL中使用的查询“内联/相似”(array_to_string((regreg_split_to_array(path,'/'))[1:3],'/'))-您可以引入SQL UDF(将其命名为ARRAY_SLICE),如以下示例所示

#standardSQL
CREATE temp  FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64) 
RETURNS ARRAY<STRING> AS (
  ARRAY(
    SELECT part FROM UNNEST(arr) part WITH OFFSET index 
    WHERE index BETWEEN start AND finish ORDER BY index
  )
);
SELECT id, path, 
  ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
FROM `project.dataset.table`  
Run Code Online (Sandbox Code Playgroud)

显然,如果要应用于相同的样本数据,您将获得相同的结果

#standardSQL
CREATE temp  FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64) 
RETURNS ARRAY<STRING> AS (
  ARRAY(
    SELECT part FROM UNNEST(arr) part WITH OFFSET index 
    WHERE index BETWEEN start AND finish ORDER BY index
  )
);
WITH `project.dataset.table` AS (
  SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
  SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2' 
)
SELECT id, path, 
  ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
FROM `project.dataset.table`   

Row     id      path                                        adjusted_path    
1       1       foo/bar/baz/foo1/bar1/baz1/                 bar/baz/foo1     
2       2       foo/bar/quux/blat/foo2/bar2/quux2/blat2     bar/quux/blat    
Run Code Online (Sandbox Code Playgroud)