(自我)按时间间隔加入

Eri*_*rik 7 sql oracle performance self-join query-optimization

我在oracle数据库中有一个表.架构是

create table PERIODS
( 
  ID NUMBER, 
  STARTTIME TIMESTAMP, 
  ENDTIME TIMESTAMP, 
  TYPE VARCHAR2(100)
)
Run Code Online (Sandbox Code Playgroud)

我有两个不同的TYPE's:TYPEATYPEB.具有独立的开始和结束时间,它们可以重叠.我想要找到的是TYPEB那个开始的时期,完全包含或在给定的时期内结束TYPEA.

这是我到目前为止提出的(有一些样本数据)

WITH mydata 
     AS (SELECT 100                                                    ID, 
                To_timestamp('2015-08-01 11:00', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:20', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 110                                                    ID, 
                To_timestamp('2015-08-01 11:30', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:50', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 120                                                    ID, 
                To_timestamp('2015-08-01 12:00', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 12:20', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 105                                                    ID, 
                To_timestamp('2015-08-01 10:55', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:05', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 108                                                    ID, 
                To_timestamp('2015-08-01 11:05', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:15', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 111                                                    ID, 
                To_timestamp('2015-08-01 11:15', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 12:25', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual), 
     typeas 
     AS (SELECT starttime, 
                endtime 
         FROM   mydata 
         WHERE  TYPE = 'TYPEA'), 
     typebs 
     AS (SELECT id, 
                starttime, 
                endtime 
         FROM   mydata 
         WHERE  TYPE = 'TYPEB') 
SELECT id 
FROM   typebs b 
       join typeas a 
         ON ( b.starttime BETWEEN a.starttime AND a.endtime ) 
             OR ( b.starttime BETWEEN a.starttime AND a.endtime 
                  AND b.endtime BETWEEN a.starttime AND a.endtime ) 
             OR ( b.endtime BETWEEN a.starttime AND a.endtime ) 
ORDER  BY id; 
Run Code Online (Sandbox Code Playgroud)

这似乎原则上起作用,上面的查询结果是

        ID
----------
       105
       108
       111
Run Code Online (Sandbox Code Playgroud)

所以它选择TYPEB在第一个TYPEA时期内开始或结束的三个时期.

问题是该表有大约200k条目,并且已经达到这个大小,上面的查询非常慢 - 这对我来说非常令人惊讶,因为两者TYPEATYPEB条目的数量都很低(1-2k)

有没有更有效的方法来执行这种类型的自联接?我的查询中是否遗漏了其他内容?

mar*_*aca 1

也许值得一试(另外你需要在oracle中最后写出最严格的条件,不要问我为什么也不相信我,最好自己做性能测试):

SELECT
   p.id
FROM
   periods p
WHERE
   EXISTS(SELECT * FROM periods q WHERE
      (p.startTime BETWEEN q.startTime AND q.endTime
      OR p.endTime BETWEEN q.startTime AND q.endTime
      OR p.startTime < q.startTime AND p.endTime > q.endTime -- overlapping correction, remove if not needed
      ) AND q.type = 'TYPEA'
   ) AND p.type = 'TYPEB'
ORDER BY
   p.id
;
Run Code Online (Sandbox Code Playgroud)