STRING_AGG表现不尽如人意

Tom*_*ter 15 sql sql-server string-aggregation sql-server-2017

我有以下查询:

WITH cteCountryLanguageMapping AS (
    SELECT * FROM (
        VALUES
            ('Spain', 'English'),
            ('Spain', 'Spanish'),
            ('Sweden', 'English'),
            ('Switzerland', 'English'),
            ('Switzerland', 'French'),
            ('Switzerland', 'German'),
            ('Switzerland', 'Italian')
    ) x ([Country], [Language])
)
SELECT
    [Country],
    CASE COUNT([Language])
        WHEN 1 THEN MAX([Language])
        WHEN 2 THEN STRING_AGG([Language], ' and ')
        ELSE STRING_AGG([Language], ', ')
    END AS [Languages],
    COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
Run Code Online (Sandbox Code Playgroud)

我期待瑞士的Languages栏中的值以逗号分隔,即:

  | Country     | Languages                                 | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain       | Spanish and English                       | 2
2 | Sweden      | English                                   | 1
3 | Switzerland | French, German, Italian, English          | 4
Run Code Online (Sandbox Code Playgroud)

相反,我得到以下输出(4个值分开and):

  | Country     | Languages                                 | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain       | Spanish and English                       | 2
2 | Sweden      | English                                   | 1
3 | Switzerland | French and German and Italian and English | 4
Run Code Online (Sandbox Code Playgroud)

我错过了什么?


这是另一个例子:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
    VALUES
        (1, 'a'),
        (1, 'b')
) x (y, z)
GROUP by y

  | y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b             | a+b
Run Code Online (Sandbox Code Playgroud)

这是SQL Server中的错误吗?

Jer*_*ert 16

是的,这是一个Bug(tm),存在于(截至编写)版本,直到SQL Server 2017 CU12(但不是,根据@DanGuzman,在Azure SQL数据库中,所以显然它已经修复并且修复可以在下一个CU).具体来说,优化器中执行公共子表达式消除的部分(确保我们不会计算超过必要的表达式)不正确地认为表单的所有表达式STRING_AGG(x, <separator>)只要x匹配,无论是什么<separator>,并将这些表达式与第一次计算结合起来查询中的表达式.

一种解决方法是x通过对其执行某种(近似)身份转换来确保不匹配.因为我们正在处理字符串,所以连接一个空字符串将会:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
    VALUES
        (1, 'a'),
        (1, 'b')
) x (y, z)
GROUP by y
Run Code Online (Sandbox Code Playgroud)