如何在 BigQuery SQL 中将字符串列拆分为多行单个单词和单词对?

Dan*_*Dan 2 google-bigquery bigquery-standard-sql legacy-sql

我正在尝试(未成功)将 Google BigQuery 中的字符串列拆分为包含所有单个单词和所有单词对(彼此相邻并按顺序排列)的行。我还需要维护 IndataTable 中单词的 ID 字段。两个记录集都有 2 列。

IndataTable as IDT
ID WordString
1 苹果香蕉梨
2 胡萝卜
3 蓝红绿黄

OutdataTable 作为 ODT
ID WordString
1 苹果
1 香蕉
1 梨
1 苹果香蕉
1 香蕉梨
2 胡萝卜
3 蓝色
3 红色
3 绿色
3 黄色
3 蓝色红色
3 红色绿色
3 绿色黄色(仅对彼此相邻)

这可能在大查询 SQL?

编辑/添加:
这是我迄今为止所拥有的,可将其拆分为单个单词。我真的很难弄清楚如何将其扩展为单词对。我不知道是否可以对此进行修改,或者我完全需要一种新方法。

SELECT ID, split(WordString,' ') as Words
FROM (
  select * 
     from 
     (select ID, WordString from IndataTable)
)
Run Code Online (Sandbox Code Playgroud)

Mik*_*ant 5

下面是 BigQuery 标准 SQL

#standardSQL
WITH IndataTable AS (
  SELECT 1 id, 'apple banana pear' WordString UNION ALL
  SELECT 2, 'carrot' UNION ALL
  SELECT 3, 'blue red green yellow' 
), words AS (
  SELECT id, word, pos
  FROM IndataTable, UNNEST(SPLIT(WordString,' ')) AS Word WITH OFFSET pos
), pairs AS (
  SELECT id, CONCAT(word, ' ', LEAD(word) OVER(PARTITION BY id ORDER BY pos)) pair
  FROM words
)
SELECT id, word AS WordString FROM words UNION ALL
SELECT id, pair AS WordString FROM pairs
WHERE NOT pair IS NULL
ORDER BY id  
Run Code Online (Sandbox Code Playgroud)

结果如预期:

Row id  WordString   
1   1   apple    
2   1   banana   
3   1   pear     
4   1   apple banana     
5   1   banana pear  
6   2   carrot   
7   3   blue     
8   3   red  
9   3   green    
10  3   yellow   
11  3   blue red     
12  3   red green    
13  3   green yellow     
Run Code Online (Sandbox Code Playgroud)