在数百万行上优化自联接

inf*_*975 5 sql sql-server optimization sql-server-2012

我有一个表,它是我的SQL Server 2012数据库中的对象的链接表,(annonsid, annonsid2).此表用于创建三角形甚至矩形链,以查看谁可以与谁交换.

这是我在表Matching_IDs上使用的查询,其中有150万行,使用此查询生成1400万个可能的链:

SELECT COUNT(*)
FROM Matching_IDs AS m
  INNER JOIN Matching_IDs AS m2
     ON m.annonsid2 = m2.annonsid
  INNER JOIN Matching_IDs AS m3
     ON m2.annonsid2 = m3.annonsid
       AND m.annonsid = m3.annonsid2
Run Code Online (Sandbox Code Playgroud)

我必须提高性能,可能需要1秒或更短时间,有没有更快的方法来做到这一点?查询在我的计算机上大约需要1分钟.我通常使用a WHERE m.annonsid=x,但它需要相同的时间,因为它必须经历所有可能的组合.

更新:最新的查询计划

|--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[globalagg1011],0)))
   |--Stream Aggregate(DEFINE:([globalagg1011]=SUM([partialagg1010])))
        |--Parallelism(Gather Streams)
             |--Stream Aggregate(DEFINE:([partialagg1010]=Count(*)))
                  |--Hash Match(Inner Join, HASH:([m2].[annonsid2], [m2].[annonsid])=([m3].[annonsid], [m].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m2].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m3].[annonsid] AND [MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m].[annonsid2]=[MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m2].[annonsid]))
                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m2].[annonsid2], [m2].[annonsid]))
                       |    |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m2]))
                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid], [m].[annonsid2]))
                            |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([m].[annonsid])=([m3].[annonsid2]), RESIDUAL:([MyDatabase].[dbo].[Matching_IDs].[annonsid] as [m].[annonsid]=[MyDatabase].[dbo].[Matching_IDs].[annonsid2] as [m3].[annonsid2]))
                                 |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m].[annonsid]), ORDER BY:([m].[annonsid] ASC))
                                 |    |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133152] AS [m]), ORDERED FORWARD)
                                 |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m3].[annonsid2]), ORDER BY:([m3].[annonsid2] ASC))
                                      |--Index Scan(OBJECT:([MyDatabase].[dbo].[Matching_IDs].[NonClusteredIndex-20121229-133207] AS [m3]), ORDERED FORWARD)
Run Code Online (Sandbox Code Playgroud)

usr*_*usr 1

看来您已经对此进行了很好的索引。您可以尝试通过添加正确的多列索引将哈希转换为合并联接,但它不会为您提供所需的 60 倍加速。

我认为这个索引会打开,annonsid, annonsid2尽管我可能在这里犯了一个错误。

实现所有这些固然很好,但索引视图不支持自连接。您可以尝试将此查询(未聚合)具体化到一个新表中。每当您对基表执行 DML 时,还要更新第二个表(使用应用程序逻辑或触发器)。这将使您的查询速度极快。