Joh*_*sky 6 sql algorithm join query-optimization intersect
我正在使用MS SQL.
我有一个巨大的表与索引来快速查询:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010
Run Code Online (Sandbox Code Playgroud)
它在不到1秒的时间内返回.该表有数十亿行.只有大约10000个结果.
我希望这个查询也能在大约一秒钟内完成:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'
Run Code Online (Sandbox Code Playgroud)
但它需要20秒.所有单个查询都需要<1秒,并返回大约10k的结果.
我希望SQL内部将每个子查询的结果抛出到哈希表中并进行哈希交集 - 应该是O(n).结果集足够大以适应内存,因此我怀疑这是一个IO问题.
我编写了一个备用查询,它只是一系列嵌套的JOIN,这也需要大约20秒,这是有道理的.
为什么INTERSECT这么慢?它是否在查询处理的早期阶段缩减为JOIN?
Joe*_*lli 14
试试这个.显然未经测试,但我认为它会为您提供您想要的结果.
select userid
from IncrementalStatistics
where IncrementalStatisticsTypeID = 5
and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)
and IncrementalStatistics.Created > '12/2/2010'
group by userid
having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3
Run Code Online (Sandbox Code Playgroud)