Ser*_*rge 6 performance sql-server set-returning-functions query-performance
我正在编写一个表值函数来根据其郊区、州和邮政编码对地址进行地理编码。我尝试使用不同的方法对地址进行地理编码,以降低准确性的顺序:
(我用的地理区域,其中郊区,邮政编码国家的关系是工作的所有。许多一对多换句话说,一个郊区可以有多个邮政编码;一个邮政编码可以有多个郊区,在不同的国家可能存在)
以下是表值函数的摘录:
ALTER FUNCTION [geocode].[tvfn_Customer_Suburb_From_Address]
(
@Suburb NVARCHAR(100),
@State NVARCHAR(100),
@Postcode NVARCHAR(100),
@Country NVARCHAR(100)
)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 1 *
FROM (
-- Unique suburb-postcode-state combinations
SELECT s.Suburb_DID
,s.Suburb
,s.State
,s.Postcode
,Geocode_DID = 4 -- Exact match by unique Postcode, Suburb and State
,s.Geocode_Latitude
,s.Geocode_Longitude
FROM geocode.tSuburbs_XX s
INNER JOIN [geocode].[tGeocode_Methods] gm
ON s.Geocode_DID = gm.Geocode_DID
WHERE s.[Is_Active] = 1
AND s.[Suburb] = @Suburb
AND s.[State] = @State
AND s.[Postcode] = @Postcode
-- Only suburbs that are geocoded with methods that can be used for geocoding customers
AND gm.[Can_Use_For_VIP] = 1
UNION ALL
-- -- Unique suburb-postcode combinations
SELECT s.Suburb_DID
,s.Suburb
,s.State
,s.Postcode
,Geocode_DID = 3 -- Exact match by unique Postcode & Suburb
,s.Geocode_Latitude
,s.Geocode_Longitude
FROM geocode.tSuburbs_XX s
INNER JOIN [geocode].[tGeocode_Methods] gm
ON s.Geocode_DID = gm.Geocode_DID
WHERE EXISTS ( SELECT *
FROM geocode.tSuburbs_XX
WHERE Is_Active = 1
AND Suburb = s.Suburb AND Postcode = s.Postcode
GROUP BY Postcode, Suburb
HAVING COUNT(*) = 1
)
AND s.Is_Active = 1
AND s.[Suburb] = @Suburb
AND s.[Postcode] = @Postcode
-- Only suburbs that are geocoded with methods that can be used for geocoding customers
AND gm.[Can_Use_For_VIP] = 1
UNION ALL
-- Exact match by unique Suburb and State
SELECT s.Suburb_DID
,s.Suburb
,s.State
,s.Postcode
,Geocode_DID = 6 -- Exact match by unique Suburb and State
,s.Geocode_Latitude
,s.Geocode_Longitude
FROM geocode.tSuburbs_XX s
INNER JOIN [geocode].[tGeocode_Methods] gm
ON s.Geocode_DID = gm.Geocode_DID
WHERE EXISTS ( SELECT *
FROM geocode.tSuburbs_XX
WHERE Is_Active = 1 AND Is_PO_Box = 0 -- Exclude PO Boxes
AND Suburb = s.Suburb AND Postcode = s.Postcode
GROUP BY Suburb, Postcode
HAVING COUNT(*) = 1
)
AND s.Is_Active = 1
AND s.[Suburb] = @Suburb
AND s.[Postcode] = @Postcode
-- Only suburbs that are geocoded with methods that can be used for geocoding customers
AND gm.[Can_Use_For_VIP] = 1
UNION ALL
-- Exact match by unique Postcode
SELECT s.Suburb_DID
,s.Suburb
,s.State
,s.Postcode
,Geocode_DID = 2 -- Exact match by unique Postcode
,s.Geocode_Latitude
,s.Geocode_Longitude
FROM geocode.tSuburbs_XX s
INNER JOIN [geocode].[tGeocode_Methods] gm
ON s.Geocode_DID = gm.Geocode_DID
WHERE EXISTS ( SELECT *
FROM geocode.tSuburbs_XX
WHERE Is_Active = 1
AND Postcode = s.Postcode
GROUP BY Postcode
HAVING COUNT(*) = 1
)
AND s.Is_Active = 1
AND s.[Postcode] = @Postcode
-- Only suburbs that are geocoded with methods that can be used for geocoding customers
AND gm.[Can_Use_For_VIP] = 1
-- Perform this extra check to make sure we don't match a postcode in a wrong country
AND ( @Country IN ('AAA', 'BBB', 'CCC')
OR @State IN ('MMM', 'NNN', 'OOO', 'PPP')
)
UNION ALL
-- Approximate match by non-unique Postcode, where all Suburbs with this Postcode are within 5 km of one another.
SELECT s.Suburb_DID
,s.Suburb
,s.State
,s.Postcode
,Geocode_DID = 5
,s.Geocode_Latitude
,s.Geocode_Longitude
FROM [geocode].[tPostcode_Distances] pd
INNER JOIN geocode.tSuburbs_XX s
ON pd.Approx_Suburb_DID = s.Suburb_DID
INNER JOIN [geocode].[tGeocode_Methods] gm
ON s.Geocode_DID = gm.Geocode_DID
WHERE s.Is_Active = 1
AND pd.[Postcode] = @Postcode
-- Only suburbs that are geocoded with methods that can be used for geocoding customers
AND gm.[Can_Use_For_VIP] = 1
-- Perform this extra check to make sure we don't match a postcode in a wrong country
AND ( @Country IN ('AAA', 'BBB', 'CCC')
OR @State IN ('MMM', 'NNN', 'OOO', 'PPP')
)
AND pd.Max_Distance <= 5000 -- within 5 km
) t
)
Run Code Online (Sandbox Code Playgroud)
上述功能有效,但我想知道是否可以改进。特别是,是否可以强制 SQL ServerSELECT在第一个SELECT返回结果集的语句之后停止处理语句(因为我们只对第一个匹配结果感兴趣 - TOP 1)?
更新
感谢您到目前为止的建议。我将[Priority]根据几个答案和评论中的建议添加一个列,以及一个ORDER BY条款,以确保我获得最佳结果。
我还将WITH SCHEMABINDING向 TVFN添加一个,以便 SQL Server 可以并行化该计划。在我们讨论这个主题时,使用多语句表值函数是一个好主意(感谢Paul White),但多语句 TVFN 总是强制执行串行计划。
我现在将尝试 Lennart 的回答,他建议使用 CTE。
如果您必须使用单个查询(根据单个内联函数的要求),您可以使用以下两个选项之一(在我最近对Relating 2 tables with possible wildcards? 的回答中进行了说明):
使用多个APPLY带有启动条件的子句,每个子句使用来自链中前一个应用程序的外部引用。此方法的效率取决于执行计划中是否存在启动过滤器。保证正确的结果,但不能保证计划形状。
向联合的每个子句添加一个带有常量文字的额外列,例如,[Priority] = 1然后ORDER BY [Priority] ASC在TOP (1)范围内添加一个。有效的操作取决于避免排序的计划。
仔细想想,在这种情况下,这不是您想要的,因为计划中的合并串联需要每个选项中的一行。然而,它是更一般情况下的一种选择(替代输入产生不止一行,并且第一行成本较低)。
此外:
由于您只返回一行,因此您可以改用多语句表值函数,使用显式逻辑按顺序尝试每个选项(在单独的查询中),在找到第一个结果后立即返回。这可以保证有效地产生正确的结果。
笔记
当前函数在技术上是不确定的;SQL Server 可以按照它选择的任何顺序评估联合,可能会在评估较高优先级的结果之前返回较低优先级的结果。
| 归档时间: |
|
| 查看次数: |
6193 次 |
| 最近记录: |