UNION ALL 或其他方式返回结果集的第一行

Ser*_*rge 6 performance sql-server set-returning-functions query-performance

我正在编写一个表值函数来根据其郊区、州和邮政编码对地址进行地理编码。我尝试使用不同的方法对地址进行地理编码,以降低准确性的顺序

  1. 独特的郊区-邮政编码-州组合的精确匹配
  2. 独特的郊区邮政编码组合的精确匹配
  3. 独特的郊区-州组合的精确匹配
  4. 与唯一邮政编码完全匹配
  5. 非唯一邮政编码的近似匹配,其中具有此邮政编码的所有郊区彼此相距在 5 公里以内。

(我用的地理区域,其中郊区,邮政编码国家的关系是工作的所有。许多一对多换句话说,一个郊区可以有多个邮政编码;一个邮政编码可以有多个郊区,在不同的国家可能存在)

以下是表值函数的摘录:

ALTER FUNCTION [geocode].[tvfn_Customer_Suburb_From_Address]
(   
    @Suburb NVARCHAR(100),
    @State NVARCHAR(100),
    @Postcode NVARCHAR(100),
    @Country NVARCHAR(100)
)
RETURNS TABLE 
AS
RETURN 
(

    SELECT TOP 1 *
    FROM (

            -- Unique suburb-postcode-state combinations
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 4 -- Exact match by unique Postcode, Suburb and State
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
            INNER JOIN [geocode].[tGeocode_Methods] gm
                ON s.Geocode_DID = gm.Geocode_DID

            WHERE s.[Is_Active] = 1
            AND s.[Suburb] = @Suburb
            AND s.[State] = @State
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1


            UNION ALL


            -- -- Unique suburb-postcode combinations
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 3 -- Exact match by unique Postcode & Suburb
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE EXISTS (  SELECT *
                            FROM geocode.tSuburbs_XX
                            WHERE Is_Active = 1
                            AND Suburb = s.Suburb AND Postcode = s.Postcode
                            GROUP BY Postcode, Suburb
                            HAVING COUNT(*) = 1
                            )
            AND s.Is_Active = 1
            AND s.[Suburb] = @Suburb
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1


            UNION ALL


            -- Exact match by unique Suburb and State
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 6 -- Exact match by unique Suburb and State
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE EXISTS (  SELECT *
                            FROM geocode.tSuburbs_XX
                            WHERE Is_Active = 1 AND Is_PO_Box = 0 -- Exclude PO Boxes
                            AND Suburb = s.Suburb AND Postcode = s.Postcode
                            GROUP BY Suburb, Postcode
                            HAVING COUNT(*) = 1
                            )
            AND s.Is_Active = 1
            AND s.[Suburb] = @Suburb
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1


            UNION ALL


            -- Exact match by unique Postcode
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 2 -- Exact match by unique Postcode
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE EXISTS (  SELECT *
                            FROM geocode.tSuburbs_XX
                            WHERE Is_Active = 1
                            AND Postcode = s.Postcode
                            GROUP BY Postcode
                            HAVING COUNT(*) = 1
                            )
            AND s.Is_Active = 1
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1
            -- Perform this extra check to make sure we don't match a postcode in a wrong country
            AND (       @Country IN ('AAA', 'BBB', 'CCC')
                    OR  @State IN ('MMM', 'NNN', 'OOO', 'PPP')
                )


            UNION ALL


            -- Approximate match by non-unique Postcode, where all Suburbs with this Postcode are within 5 km of one another.
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 5
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM [geocode].[tPostcode_Distances] pd
                INNER JOIN  geocode.tSuburbs_XX s
                    ON pd.Approx_Suburb_DID = s.Suburb_DID
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE  s.Is_Active = 1
            AND pd.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1
            -- Perform this extra check to make sure we don't match a postcode in a wrong country
            AND (       @Country IN ('AAA', 'BBB', 'CCC')
                    OR  @State IN ('MMM', 'NNN', 'OOO', 'PPP')
                )
            AND pd.Max_Distance <= 5000 -- within 5 km
    ) t
)
Run Code Online (Sandbox Code Playgroud)

上述功能有效,但我想知道是否可以改进。特别是,是否可以强制 SQL ServerSELECT在第一个SELECT返回结果集的语句之后停止处理语句(因为我们只对第一个匹配结果感兴趣 - TOP 1)?

更新

感谢您到目前为止的建议。我将[Priority]根据几个答案和评论中的建议添加一个列,以及一个ORDER BY条款,以确保我获得最佳结果。

我还将WITH SCHEMABINDING向 TVFN添加一个,以便 SQL Server 可以并行化该计划。在我们讨论这个主题时,使用多语句表值函数是一个好主意(感谢Paul White),但多语句 TVFN 总是强制执行串行计划。

我现在将尝试 Lennart 的回答,他建议使用 CTE。

Pau*_*ite 9

如果您必须使用单个查询(根据单个内联函数的要求),您可以使用以下两个选项之一(在我最近对Relating 2 tables with possible wildcards? 的回答中进行了说明):

选项1

使用多个APPLY带有启动条件的子句,每个子句使用来自链中前一个应用程序的外部引用。此方法的效率取决于执行计划中是否存在启动过滤器。保证正确的结果,但不能保证计划形状。

选项 2

向联合的每个子句添加一个带有常量文字的额外列,例如,[Priority] = 1然后ORDER BY [Priority] ASCTOP (1)范围内添加一个。有效的操作取决于避免排序的计划。

仔细想想,在这种情况下,这不是您想要的,因为计划中的合并串联需要每个选项中的一行。然而,它是更一般情况下的一种选择(替代输入产生不止一行,并且第一行成本较低)。


此外:

选项 3

由于您只返回一行,因此您可以改用多语句表值函数,使用显式逻辑按顺序尝试每个选项(在单独的查询中),在找到第一个结果后立即返回。这可以保证有效地产生正确的结果。

笔记

当前函数在技术上是不确定的;SQL Server 可以按照它选择的任何顺序评估联合,可能会在评估较高优先级的结果之前返回较低优先级的结果。


小智 3

使用合并所有查询LEFT OUTER JOINS并按相反顺序使用结果列作为COALESCE函数的参数。该函数将从左到右评估所有参数,并取第一个非空值。