使用正则表达式从Google BigQuery中的字符串中提取数字

and*_*894 2 regex google-bigquery

我想知道我是否可以在BigQuery中使用正则表达式从字符串中提取所有数字.

我认为以下工作但只返回第一击 - 有没有办法提取所有的命中.

我的用例是,我基本上想从网址中获取最大数字,因为它往往更像是我需要加入的post_id.

这是我所说的一个例子:

SELECT
  mystr,
  REGEXP_EXTRACT(mystr, r'(\d+)') AS nums
FROM
  (SELECT 'this is a string with some 666 numbers 999 in it 333' AS mystr),
  (SELECT 'just one number 123 in this one ' AS mystr),
  (SELECT '99' AS mystr),
  (SELECT 'another -2 example 99' AS mystr),
  (SELECT 'another-8766 example 99' AS mystr),
  (SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999' AS mystr),
  (SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001' AS mystr),
  (SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview' AS mystr)
Run Code Online (Sandbox Code Playgroud)

我得到的结果是:

[
  {
    "mystr": "this is a string with some 666 numbers 999 in it 333",
    "nums": "666"
  },
  {
    "mystr": "just one number 123 in this one ",
    "nums": "123"
  },
  {
    "mystr": "99",
    "nums": "99"
  },
  {
    "mystr": "another -2 example 99",
    "nums": "2"
  },
  {
    "mystr": "another-8766 example 99",
    "nums": "8766"
  },
  {
    "mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999",
    "nums": "2015"
  },
  {
    "mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001",
    "nums": "2015"
  },
  {
    "mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview",
    "nums": "2015"
  }
]
Run Code Online (Sandbox Code Playgroud)

Cyr*_*bil 7

经过一番挖掘,我最终得到了这个解决方案:

SELECT
  mystr,
  GROUP_CONCAT(SPLIT(REGEXP_REPLACE(mystr, r'[^\d]+', ','))) AS nums
FROM
  (SELECT 'this is a string with some 666 numbers 999 in it 333' AS mystr),
  (SELECT 'just one number 123 in this one ' AS mystr),
  (SELECT '99' AS mystr),
  (SELECT 'another -2 example 99' AS mystr),
  (SELECT 'another-8766 example 99' AS mystr),
  (SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999' AS mystr),
  (SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001' AS mystr),
  (SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview' AS mystr)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

这个怎么运作:

  • 我首先使用正则表达式来匹配任何数字并用逗号替换
  • 然后split用来得到结果,空结果被丢弃
  • group_concat 就是在这里展示结果