如何提高同一表上查询匹配的性能

Chr*_*ing 2 performance sql-server t-sql query-performance

我需要在同一个表中查找可能匹配的客户记录。逻辑如下。然而,这似乎在 O(N²) 下执行。有没有办法提高这里的性能?我试过设置索引、散列列并进行比较等,但在大型数据集上的性能仍然很糟糕。我还在下面添加了查询计划。

SELECT
    C1.CustomerId AS Customer1, 
    C2.CustomerId AS Customer2
FROM Customer C1
INNER JOIN Customer C2
    ON 
    C1.CustomerId != C2.CustomerId
    AND
    (C1.FirstName = C2.FirstName OR C1.BirthDate = C2.BirthDate)
    AND
    (
        C1.EmailAddress = C2.EmailAddress
        OR
        C1.MobilePhoneNumber = C2.MobilePhoneNumber
        OR
        (
            C1.HomeAddressLine1 = C2.HomeAddressLine1
            AND
            (
                C1.HomePostCode = C2.HomePostCode
                OR
                C1.HomeSuburb = C2.HomeSuburb
            )
        )
    )
Run Code Online (Sandbox Code Playgroud)

查询计划

Mik*_*son 8

您可以将查询拆分为两个不同的查询,允许使用两个不同的覆盖索引来帮助您更快地找到行。

一个用于检查的查询FirstName和另一个用于检查BirthDate.

select C1.CustomerId,
       C2.CustomerId
from dbo.Customer as C1
  inner join dbo.Customer as C2
    on C1.CustomerId <> C2.CustomerId and
       C1.FirstName = C2.FirstName and
            (
              C1.EmailAddress = C2.EmailAddress or 
              C1.MobilePhoneNumber = C2.MobilePhoneNumber or 
              (
                 C1.HomeAddressLine1 = C2.HomeAddressLine1 and
                (C1.HomePostCode = C2.HomePostCode or C1.HomeSuburb = C2.HomeSuburb)
              )
            );


select C1.CustomerId,
       C2.CustomerId
from dbo.Customer as C1
  inner join dbo.Customer as C2
    on C1.CustomerId <> C2.CustomerId and
       C1.BirthDate = C2.BirthDate and
            (
              C1.EmailAddress = C2.EmailAddress or 
              C1.MobilePhoneNumber = C2.MobilePhoneNumber or 
              (
                 C1.HomeAddressLine1 = C2.HomeAddressLine1 and
                (C1.HomePostCode = C2.HomePostCode or C1.HomeSuburb = C2.HomeSuburb)
              )
            );
Run Code Online (Sandbox Code Playgroud)

支持这些查询所需的索引:

create nonclustered index IX_FirstName on dbo.Customer(FirstName) 
  include(EmailAddress, MobilePhoneNumber, HomeAddressLine1, HomePostCode, HomeSuburb);

create nonclustered index IX_BirthDate on dbo.Customer(BirthDate) 
  include(EmailAddress, MobilePhoneNumber, HomeAddressLine1, HomePostCode, HomeSuburb);
Run Code Online (Sandbox Code Playgroud)

在我相当有限的测试用例中,我看到了从比我想等待的时间更长的时间缩短到 4 秒*。

我为这两个查询得到的查询计划。

在此处输入图片说明

在此处输入图片说明

*我实际上开始了原始查询并忘记了它。抓挠我的头想弄清楚为什么我的电脑很慢,发现查询已经运行了 40 分钟。