Chr*_*ing 2 performance sql-server t-sql query-performance
我需要在同一个表中查找可能匹配的客户记录。逻辑如下。然而,这似乎在 O(N²) 下执行。有没有办法提高这里的性能?我试过设置索引、散列列并进行比较等,但在大型数据集上的性能仍然很糟糕。我还在下面添加了查询计划。
SELECT
C1.CustomerId AS Customer1,
C2.CustomerId AS Customer2
FROM Customer C1
INNER JOIN Customer C2
ON
C1.CustomerId != C2.CustomerId
AND
(C1.FirstName = C2.FirstName OR C1.BirthDate = C2.BirthDate)
AND
(
C1.EmailAddress = C2.EmailAddress
OR
C1.MobilePhoneNumber = C2.MobilePhoneNumber
OR
(
C1.HomeAddressLine1 = C2.HomeAddressLine1
AND
(
C1.HomePostCode = C2.HomePostCode
OR
C1.HomeSuburb = C2.HomeSuburb
)
)
)
Run Code Online (Sandbox Code Playgroud)
您可以将查询拆分为两个不同的查询,允许使用两个不同的覆盖索引来帮助您更快地找到行。
一个用于检查的查询FirstName
和另一个用于检查BirthDate
.
select C1.CustomerId,
C2.CustomerId
from dbo.Customer as C1
inner join dbo.Customer as C2
on C1.CustomerId <> C2.CustomerId and
C1.FirstName = C2.FirstName and
(
C1.EmailAddress = C2.EmailAddress or
C1.MobilePhoneNumber = C2.MobilePhoneNumber or
(
C1.HomeAddressLine1 = C2.HomeAddressLine1 and
(C1.HomePostCode = C2.HomePostCode or C1.HomeSuburb = C2.HomeSuburb)
)
);
select C1.CustomerId,
C2.CustomerId
from dbo.Customer as C1
inner join dbo.Customer as C2
on C1.CustomerId <> C2.CustomerId and
C1.BirthDate = C2.BirthDate and
(
C1.EmailAddress = C2.EmailAddress or
C1.MobilePhoneNumber = C2.MobilePhoneNumber or
(
C1.HomeAddressLine1 = C2.HomeAddressLine1 and
(C1.HomePostCode = C2.HomePostCode or C1.HomeSuburb = C2.HomeSuburb)
)
);
Run Code Online (Sandbox Code Playgroud)
支持这些查询所需的索引:
create nonclustered index IX_FirstName on dbo.Customer(FirstName)
include(EmailAddress, MobilePhoneNumber, HomeAddressLine1, HomePostCode, HomeSuburb);
create nonclustered index IX_BirthDate on dbo.Customer(BirthDate)
include(EmailAddress, MobilePhoneNumber, HomeAddressLine1, HomePostCode, HomeSuburb);
Run Code Online (Sandbox Code Playgroud)
在我相当有限的测试用例中,我看到了从比我想等待的时间更长的时间缩短到 4 秒*。
我为这两个查询得到的查询计划。
*我实际上开始了原始查询并忘记了它。抓挠我的头想弄清楚为什么我的电脑很慢,发现查询已经运行了 40 分钟。
归档时间: |
|
查看次数: |
106 次 |
最近记录: |