拆分子字符串并为大查询中的每个子字符串创建新列

Eva*_*van 3 sql google-bigquery

我想将一个空格分隔的字符串分成 5 个并为每个创建列,但发现很难产生所需的输出。编辑:使用标准 SQL 方言

样本数据:

Row published_at                data_string          device id 
1   2016-10-26T22:53:03.209Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan    2a0025000351353337353037
... 
1 of 570 rows
Run Code Online (Sandbox Code Playgroud)

期望的输出:

Row published_at                battery temp1  humid1 temp2  humid2 temp3 humid3 device_id   
1   2016-11-03T16:24:09.833Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037 
1 of 570 rows
Run Code Online (Sandbox Code Playgroud)

尝试查询 1.a:

WITH
  h2a0025_2 AS (
  SELECT
    TIMESTAMP '2016-10-26T22:53:03.209Z' AS published_at,
    '70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan' AS data_string,
    '2a0025000351353337353037' AS device_id
  UNION ALL
  SELECT
    TIMESTAMP '2016-10-26T22:53:03.209Z',
    '70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan',
    '2a0025000351353337353037' )
SELECT
  published_at,
  parts[OFFSET(0)] AS Battery,
  parts[OFFSET(1)] AS Temp1,
  parts[OFFSET(1)] AS Humid1,
  parts[OFFSET(2)] AS Temp2,
  parts[OFFSET(3)] AS Humid2,
  parts[OFFSET(4)] AS Temp3,
  parts[OFFSET(5)] AS Humid3,
  device_id
FROM (
  SELECT
    * EXCEPT(data_string),
    SPLIT(data_string, ' ') AS parts
  FROM
    `h2a0025_2`);
Run Code Online (Sandbox Code Playgroud)

结果 1.a:2 个相同的行

  Row   published_at                battery temp1  humid1 temp2  humid2 temp3 humid3 device_id   
    1   2016-11-03T16:24:09.833Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037 
    2   2016-11-03T16:24:09.833Z    70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037
2 of 2 rows
Run Code Online (Sandbox Code Playgroud)

尝试2:

 SELECT
      published_at,
      parts[OFFSET(0)] AS Battery,
      parts[OFFSET(1)] AS Temp1,
      parts[OFFSET(1)] AS Humid1,
      parts[OFFSET(2)] AS Temp2,
      parts[OFFSET(3)] AS Humid2,
      parts[OFFSET(4)] AS Temp3,
      parts[OFFSET(5)] AS Humid3,
      device_id
    FROM (
      SELECT
        * EXCEPT(data_string),
        SPLIT(data_string, ' ') AS parts
      FROM
        `myproject.mydataset.h2a0025_2`);
Run Code Online (Sandbox Code Playgroud)

结果:查询失败错误:数组索引 3 超出范围(溢出)

Ell*_*ard 5

这是一个帮助您入门的示例。不要尝试获取正确的子字符串位置,而是使用该SPLIT函数,然后在结果数组中挑选出所需的偏移量。

#standardSQL
WITH YourTable AS (
  SELECT
    TIMESTAMP '2016-11-03T16:24:09.833Z' AS published_at,
    '80.91 22.15 45.35 14.41 64.54' AS data_string
  UNION ALL
  SELECT
    TIMESTAMP '2016-11-04T18:34:08.143Z',
    '75.37 28.43 31.17 34.80 19.33'
)
SELECT
  published_at,
  parts[OFFSET(0)] AS Temp1,
  parts[OFFSET(1)] AS Humid1,
  parts[OFFSET(2)] AS Temp2,
  parts[OFFSET(3)] AS Humid2
FROM (
  SELECT
    * EXCEPT(data_string),
    SPLIT(data_string, ' ') AS parts
  FROM YourTable
);
Run Code Online (Sandbox Code Playgroud)

要使用真实的表进行测试 - 仅使用脚本的以下部分 -

#standardSQL
SELECT
  published_at,
  parts[OFFSET(0)] AS Temp1,
  parts[OFFSET(1)] AS Humid1,
  parts[OFFSET(2)] AS Temp2,
  parts[OFFSET(3)] AS Humid2
FROM (
  SELECT
    * EXCEPT(data_string),
    SPLIT(data_string, ' ') AS parts
  FROM `yourproject.yourdataset.yourtable`
);
Run Code Online (Sandbox Code Playgroud)