选择相隔不到1秒的所有记录

Dav*_*d S 2 sql postgresql postgresql-9.3

我有一张桌子:

create table purchase(
   transaction_id integer,
   account_id bigint,
   created timestamp with time zone,
   price numeric(5,2)
)
Run Code Online (Sandbox Code Playgroud)

我认为我有一个系统向我发送重复记录的问题,但我不知道这个问题有多广泛.

我需要一个查询来选择在1秒内创建的所有记录(不一定是同一秒)具有相同的account_id和相同的价格.所以,例如,我希望能够找到这两个记录:

+----------------+----------------+-------------------------------+-------+
| transaction_id |   account_id   |            created            | price |
+----------------+----------------+-------------------------------+-------+
|          85239 | 80012340116730 | 2014-05-07 15:46:03.361959+00 |  8.47 |
|          85240 | 80012340116730 | 2014-05-07 15:46:04.118911+00 |  8.47 |
+----------------+----------------+-------------------------------+-------+
Run Code Online (Sandbox Code Playgroud)

如何在单个查询中执行此操作?

我正在使用PostgreSQL 9.3.

Erw*_*ter 5

您需要在两个方向内检查一秒内是否存在行.
您需要从测试中排除行本身:

SELECT *
FROM   purchase p
WHERE  EXISTS (
   SELECT 1
   FROM  purchase p1
   WHERE p1.created > p.created - interval '1 sec' -- "less than a second"
   AND   p1.created < p.created + interval '1 sec'
   AND   p1.account_id = p.account_id
   AND   p1.price      = p.price
   AND   p1.transaction_id <> p.transaction_id   -- assuming that's the pk
   )
ORDER BY account_id, price, created;         -- optional, for handy output
Run Code Online (Sandbox Code Playgroud)

这些WHERE条件是可优化搜索,它允许使用的索引的created:

WHERE p1.created > p.created - interval '1 sec'
AND   p1.created < p.created + interval '1 sec'
Run Code Online (Sandbox Code Playgroud)

相反:

p1.created - p.created < interval '1 sec'
Run Code Online (Sandbox Code Playgroud)

后者不能使用索引,created这可能会减慢大表的查询速度.Postgres被迫测试所有剩余的组合(在应用其他条件之后).根据其他条件的选择性和工作台的大小,这可能是无关紧要的,也可能是中等到巨大的性能损失.
对于中小型表,测试显示了两个序列扫描和一个哈希半连接用于任一查询.

指数

案例的完美索引将是以下形式的多列索引:

CREATE INDEX purchase_foo_idx ON purchase (account_id, price, created)
Run Code Online (Sandbox Code Playgroud)

但是各个列上的索引组合也可以很好地工作(并且可能有更多用例).


kli*_*lin 4

我想你正在寻找这样的东西:

select *
from purchase p1
where exists (
    select transaction_id 
    from purchase p2 
    where p2.created > p1.created
    and p2.created - p1.created < interval '1 second'
    and p2.account_id = p1.account_id
    and p2.price = p1.price)
Run Code Online (Sandbox Code Playgroud)

编辑:查询在大表上可能非常繁重。考虑将其限制为一天:

select *
from purchase p1
where 
    p1.created::date = '2014-05-08'
    and exists (
        select transaction_id 
        from purchase p2 
        where p2.created::date = '2014-05-08'
        and p2.created > p1.created
        and p2.created - p1.created < interval '1 second'
        and p2.account_id = p1.account_id
        and p2.price = p1.price)
Run Code Online (Sandbox Code Playgroud)

SQL小提琴