Presto 中的压缩阵列

isk*_*lue 10 sql presto

array_agg()我有一个查询,使用它们的函数生成数组字符串

SELECT 
array_agg(message) as sequence
from mytable
group by id
Run Code Online (Sandbox Code Playgroud)

它会生成一个如下所示的表:

                 sequence
1 foo foo bar baz bar baz
2     foo bar bar bar baz
3 foo foo foo bar bar baz
Run Code Online (Sandbox Code Playgroud)

但我的目标是压缩字符串数组,以便没有一个字符串可以连续重复多次,例如,所需的输出如下所示:

    sequence
1 foo bar baz bar baz
2 foo bar baz
3 foo bar baz
Run Code Online (Sandbox Code Playgroud)

如何使用 Presto SQL 来做到这一点?

Mar*_*rso 12

您可以通过以下两种方式之一执行此操作:

  1. 使用以下函数从结果数组中删除重复项array_distinct
WITH mytable(id, message) AS (VALUES
  (1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
  (2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
  (3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_distinct(array_agg(message)) AS sequence
FROM mytable
GROUP BY id
Run Code Online (Sandbox Code Playgroud)
  1. 在将重复值传递到 array_agg 之前,使用DISTINCT聚合中的限定符删除重复值。
WITH mytable(id, message) AS (VALUES
  (1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
  (2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'), (3, 'foo'),
  (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(DISTINCT message) AS sequence
FROM mytable
GROUP BY id
Run Code Online (Sandbox Code Playgroud)

两种选择都会产生相同的结果:

    sequence
-----------------
 [foo, bar, baz]
 [foo, bar, baz]
 [foo, bar, baz]
(3 rows)
Run Code Online (Sandbox Code Playgroud)

更新:您可以使用最近引入的MATCH_RECOGNIZE功能删除重复的元素序列:

    sequence
-----------------
 [foo, bar, baz]
 [foo, bar, baz]
 [foo, bar, baz]
(3 rows)
Run Code Online (Sandbox Code Playgroud)