Eva*_*van 3 sql google-bigquery
我想将一个空格分隔的字符串分成 5 个并为每个创建列,但发现很难产生所需的输出。编辑:使用标准 SQL 方言
样本数据:
Row published_at data_string device id
1 2016-10-26T22:53:03.209Z 70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan 2a0025000351353337353037
...
1 of 570 rows
Run Code Online (Sandbox Code Playgroud)
期望的输出:
Row published_at battery temp1 humid1 temp2 humid2 temp3 humid3 device_id
1 2016-11-03T16:24:09.833Z 70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037
1 of 570 rows
Run Code Online (Sandbox Code Playgroud)
尝试查询 1.a:
WITH
h2a0025_2 AS (
SELECT
TIMESTAMP '2016-10-26T22:53:03.209Z' AS published_at,
'70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan' AS data_string,
'2a0025000351353337353037' AS device_id
UNION ALL
SELECT
TIMESTAMP '2016-10-26T22:53:03.209Z',
'70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan',
'2a0025000351353337353037' )
SELECT
published_at,
parts[OFFSET(0)] AS Battery,
parts[OFFSET(1)] AS Temp1,
parts[OFFSET(1)] AS Humid1,
parts[OFFSET(2)] AS Temp2,
parts[OFFSET(3)] AS Humid2,
parts[OFFSET(4)] AS Temp3,
parts[OFFSET(5)] AS Humid3,
device_id
FROM (
SELECT
* EXCEPT(data_string),
SPLIT(data_string, ' ') AS parts
FROM
`h2a0025_2`);
Run Code Online (Sandbox Code Playgroud)
结果 1.a:2 个相同的行
Row published_at battery temp1 humid1 temp2 humid2 temp3 humid3 device_id
1 2016-11-03T16:24:09.833Z 70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037
2 2016-11-03T16:24:09.833Z 70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037
2 of 2 rows
Run Code Online (Sandbox Code Playgroud)
尝试2:
SELECT
published_at,
parts[OFFSET(0)] AS Battery,
parts[OFFSET(1)] AS Temp1,
parts[OFFSET(1)] AS Humid1,
parts[OFFSET(2)] AS Temp2,
parts[OFFSET(3)] AS Humid2,
parts[OFFSET(4)] AS Temp3,
parts[OFFSET(5)] AS Humid3,
device_id
FROM (
SELECT
* EXCEPT(data_string),
SPLIT(data_string, ' ') AS parts
FROM
`myproject.mydataset.h2a0025_2`);
Run Code Online (Sandbox Code Playgroud)
结果:查询失败错误:数组索引 3 超出范围(溢出)
这是一个帮助您入门的示例。不要尝试获取正确的子字符串位置,而是使用该SPLIT函数,然后在结果数组中挑选出所需的偏移量。
#standardSQL
WITH YourTable AS (
SELECT
TIMESTAMP '2016-11-03T16:24:09.833Z' AS published_at,
'80.91 22.15 45.35 14.41 64.54' AS data_string
UNION ALL
SELECT
TIMESTAMP '2016-11-04T18:34:08.143Z',
'75.37 28.43 31.17 34.80 19.33'
)
SELECT
published_at,
parts[OFFSET(0)] AS Temp1,
parts[OFFSET(1)] AS Humid1,
parts[OFFSET(2)] AS Temp2,
parts[OFFSET(3)] AS Humid2
FROM (
SELECT
* EXCEPT(data_string),
SPLIT(data_string, ' ') AS parts
FROM YourTable
);
Run Code Online (Sandbox Code Playgroud)
要使用真实的表进行测试 - 仅使用脚本的以下部分 -
#standardSQL
SELECT
published_at,
parts[OFFSET(0)] AS Temp1,
parts[OFFSET(1)] AS Humid1,
parts[OFFSET(2)] AS Temp2,
parts[OFFSET(3)] AS Humid2
FROM (
SELECT
* EXCEPT(data_string),
SPLIT(data_string, ' ') AS parts
FROM `yourproject.yourdataset.yourtable`
);
Run Code Online (Sandbox Code Playgroud)