BigQuery:SPLIT()中的错误返回

Mun*_*ala 1 sql google-bigquery legacy-sql

我在BigQuery中有一个表TabA,它有一列ColA,ColA列有以下结构

1038627|21514184
Run Code Online (Sandbox Code Playgroud)

而TabA表有超过一百万条记录.我用来分成多列

SELECT ColA,FIRST(SPLIT(ColA, '/')) part1,
       NTH(2, SPLIT(ColA, '/')) part2
FROM TabA
Run Code Online (Sandbox Code Playgroud)

但由于某种原因,在某些行之后,拆分似乎无法正常工作.

我们得到这样的记录,

     ColA            part1   part2
1038627|21507470    1038627 21507470     
1038627|21534857    1038627 21507470     
1038627|21546455    1038627 21507470     
1038627|21577167    1038627 21507470
Run Code Online (Sandbox Code Playgroud)

It his happening on a random basis. Not sure where is there error.

SELECT COUNT(*) FROM TabA - returns say 1.7M records


SELECT ColA,FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns 1.7M records with the wrong split


SELECT FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns just 1.4L records with correct split

Don't know what exactly is happening...is it the problem with the data or the problem with the split ??

Any help would be greatly appreciated. Thanks in advance!!

Mik*_*ant 5

这是数据的问题还是分裂问题?

为了帮助进行故障排除 - 我建议在BigQuery Standard SQL中运行相同的逻辑

#standardSQL
SELECT 
  ColA,
  SPLIT(ColA, '|')[SAFE_OFFSET(0)] AS part1,
  SPLIT(ColA, '|')[SAFE_OFFSET(1)] AS part2
FROM TabA
Run Code Online (Sandbox Code Playgroud)