我四处搜索,找不到关于这个主题的太多东西(可能是不好的搜索词:)。我有一个表 Protopayload.resource,它获取 Apache 日志信息。因此,我感兴趣的字段包含我需要搜索的多个值。该字段的格式为 php URL 样式。IE
/?id=13242134123&ver=12&os_bits=64&os_type=mac&lng=EN
Run Code Online (Sandbox Code Playgroud)
这使得所有搜索最终都以非常长的正则表达式来获取数据。然后join语句来合并数据。
结合 mac/win 统计信息的示例搜索
SELECT
t1.date, t1.wincount, COALESCE(t2.maccount, 0) AS maccount
FROM (
SELECT
DATE(metadata.timestamp) AS date,
INTEGER(COUNT(protoPayload.resource)) AS wincount
FROM (TABLE_DATE_RANGE(tablename, DATE_ADD(CURRENT_TIMESTAMP(), -30, 'DAY'), CURRENT_TIMESTAMP() ))
WHERE
(REGEXP_MATCH(protoPayload.resource, r'ver=[11,12'))
AND protoPayload.resource CONTAINS 'os=win' GROUP BY date ) t1
LEFT JOIN (
SELECT
DATE(metadata.timestamp) AS date,
INTEGER(COUNT(protoPayload.resource)) AS maccount
FROM (TABLE_DATE_RANGE(tablename, DATE_ADD(CURRENT_TIMESTAMP(), -30, 'DAY'), CURRENT_TIMESTAMP() ))
WHERE
(REGEXP_MATCH(protoPayload.resource, r'cv=[p,m][17,16,15,14]'))
AND protoPayload.resource CONTAINS 'os=mac' GROUP BY date ) t2
ON
t1.date = t2.date
ORDER BY t1.date
Run Code Online (Sandbox Code Playgroud)
我在想的是使用类似的正则表达式搜索。创建一个新表。然后将数据保存到具有关系字段的新表中。然后修复未来的日志记录,使其正确记录到表中。
我的问题是这个有效的解决方案,还是在 Google BigQuery 中有更简单的方法来完成这个?有没有更好的方法来转换数据?再次感谢您的任何意见!
您可以使用 SQL 函数将键值对解析为数组,这通常比使用 JavaScript 更快。例如,
#standardSQL
CREATE TEMPORARY FUNCTION ParseKeys(queryString STRING)
RETURNS ARRAY<STRUCT<key STRING, value STRING>> AS (
(SELECT
ARRAY_AGG(STRUCT(
entry[OFFSET(0)] AS key,
entry[OFFSET(1)] AS value))
FROM (
SELECT SPLIT(pairString, '=') AS entry
FROM UNNEST(SPLIT(REGEXP_EXTRACT(queryString, r'/\?(.*)'), '&')) AS pairString)
)
);
SELECT ParseKeys('/?foo=bar&baz=2');
Run Code Online (Sandbox Code Playgroud)
现在,您可以使用一个将键转换为结构字段的函数来构建它:
#standardSQL
CREATE TEMP FUNCTION GetAttributes(queryString STRING) AS (
(SELECT AS STRUCT
MAX(IF(key = 'id', CAST(value AS INT64), NULL)) AS id,
MAX(IF(key = 'ver', CAST(value AS INT64), NULL)) AS ver,
MAX(IF(key = 'os_bits', CAST(value AS INT64), NULL)) AS os_bits,
MAX(IF(key = 'os_type', value, NULL)) AS os_type,
MAX(IF(key = 'lng', value, NULL)) AS lng
FROM UNNEST(ParseKeys(queryString)))
);
Run Code Online (Sandbox Code Playgroud)
将所有内容放在一起,您可以GetAttributes使用一些示例输入来试用该功能:
#standardSQL
CREATE TEMPORARY FUNCTION ParseKeys(queryString STRING)
RETURNS ARRAY<STRUCT<key STRING, value STRING>> AS (
(SELECT
ARRAY_AGG(STRUCT(
entry[OFFSET(0)] AS key,
entry[OFFSET(1)] AS value))
FROM (
SELECT SPLIT(pairString, '=') AS entry
FROM UNNEST(SPLIT(REGEXP_EXTRACT(queryString, r'/\?(.*)'), '&')) AS pairString)
)
);
CREATE TEMP FUNCTION GetAttributes(queryString STRING) AS (
(SELECT AS STRUCT
MAX(IF(key = 'id', CAST(value AS INT64), NULL)) AS id,
MAX(IF(key = 'ver', CAST(value AS INT64), NULL)) AS ver,
MAX(IF(key = 'os_bits', CAST(value AS INT64), NULL)) AS os_bits,
MAX(IF(key = 'os_type', value, NULL)) AS os_type,
MAX(IF(key = 'lng', value, NULL)) AS lng
FROM UNNEST(ParseKeys(queryString)))
);
SELECT url, GetAttributes(url).*
FROM UNNEST(['/?id=13242134123&ver=12&os_bits=64&os_type=mac&lng=EN',
'/?id=2343645745&ver=15&os_bits=32&os_type=linux&lng=FR']) AS url;
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
803 次 |
| 最近记录: |