某些 Unicode 空格类别字符未被 TRANSLATE 或 REPLACE 替换

Col*_*ens 9 sql t-sql sql-server

我定义了一个函数dbo.ufnRemoveSpaceCharacters来删除 Unicode 类别 Zs(空格字符)下的所有字符。所有 17 个字符都可以使用REPLACETRANSLATE当它们是字符串中的唯一字符(NCHAR(1)NVARCHAR(1))时进行替换/翻译,但是当这些字符组合成一个字符串时,例如a?a(其中空格/破折号是 Ogham 空格标记0x1680) ,字符不会被替换。以下字符不会被替换/翻译Ogham Space Mark 0x1680, Narrow No Break Space 0x202F, Medium Mathematical Space 0x205F

DROP FUNCTION IF EXISTS dbo.ufnRemoveSpaceCharacters
GO

CREATE FUNCTION dbo.ufnRemoveSpaceCharacters(@Value NVARCHAR(MAX))  
RETURNS NVARCHAR(MAX)
AS   
BEGIN  
    DECLARE @Space NCHAR = NCHAR(0x0020);
    DECLARE @NoBreakSpace NCHAR = NCHAR(0x00A0);
    DECLARE @OghamSpaceMark NCHAR = NCHAR(0x1680);
    DECLARE @EnQuad NCHAR = NCHAR(0x2000);
    DECLARE @EmQuad NCHAR = NCHAR(0x2001);
    DECLARE @EnSpace NCHAR = NCHAR(0x2002);
    DECLARE @EmSpace NCHAR = NCHAR(0x2003);
    DECLARE @ThreePerEmSpace NCHAR = NCHAR(0x2004);
    DECLARE @FourPerEmSpace NCHAR = NCHAR(0x2005);
    DECLARE @SixPerEmSpace NCHAR = NCHAR(0x2006);
    DECLARE @FigureSpace NCHAR = NCHAR(0x2007);
    DECLARE @PunctuationSpace NCHAR = NCHAR(0x2008);
    DECLARE @ThinSpace NCHAR = NCHAR(0x2009);
    DECLARE @HairSpace NCHAR = NCHAR(0x200A);
    DECLARE @NarrowNoBreakSpace NCHAR = NCHAR(0x202F);
    DECLARE @MediumMathematicalSpace NCHAR = NCHAR(0x205F);
    DECLARE @IdeographicSpace NCHAR = NCHAR(0x3000);

    DECLARE @SpaceCharacters NVARCHAR(17) =
        @Space + @NoBreakSpace + @OghamSpaceMark + @EnQuad 
      + @EmQuad + @EnSpace + @EmSpace + @ThreePerEmSpace 
      + @FourPerEmSpace + @SixPerEmSpace + @FigureSpace 
      + @PunctuationSpace + @ThinSpace + @HairSpace 
      + @NarrowNoBreakSpace + @MediumMathematicalSpace + @IdeographicSpace;

    DECLARE @ReplacePhrase NVARCHAR(17) = N'aaaaaaaaaaaaaaaaa';

    RETURN REPLACE(TRANSLATE(@Value, @SpaceCharacters, @ReplacePhrase), @Space,  '')
END
GO
Run Code Online (Sandbox Code Playgroud)

您可以使用以下方法重现这一点

DECLARE @Space NCHAR = NCHAR(0x0020);
DECLARE @NoBreakSpace NCHAR = NCHAR(0x00A0);
DECLARE @OghamSpaceMark NCHAR = NCHAR(0x1680);
DECLARE @EnQuad NCHAR = NCHAR(0x2000);
DECLARE @EmQuad NCHAR = NCHAR(0x2001);
DECLARE @EnSpace NCHAR = NCHAR(0x2002);
DECLARE @EmSpace NCHAR = NCHAR(0x2003);
DECLARE @ThreePerEmSpace NCHAR = NCHAR(0x2004);
DECLARE @FourPerEmSpace NCHAR = NCHAR(0x2005);
DECLARE @SixPerEmSpace NCHAR = NCHAR(0x2006);
DECLARE @FigureSpace NCHAR = NCHAR(0x2007);
DECLARE @PunctuationSpace NCHAR = NCHAR(0x2008);
DECLARE @ThinSpace NCHAR = NCHAR(0x2009);
DECLARE @HairSpace NCHAR = NCHAR(0x200A);
DECLARE @NarrowNoBreakSpace NCHAR = NCHAR(0x202F);
DECLARE @MediumMathematicalSpace NCHAR = NCHAR(0x205F);
DECLARE @IdeographicSpace NCHAR = NCHAR(0x3000);

DECLARE @SpaceCharacters NVARCHAR(17) =
    @Space + @NoBreakSpace + @OghamSpaceMark + @EnQuad 
    + @EmQuad + @EnSpace + @EmSpace + @ThreePerEmSpace 
    + @FourPerEmSpace + @SixPerEmSpace + @FigureSpace 
    + @PunctuationSpace + @ThinSpace + @HairSpace 
    + @NarrowNoBreakSpace + @MediumMathematicalSpace + @IdeographicSpace;

SELECT REPLACE(@OghamSpaceMark, @OghamSpaceMark, 'a'), 
       TRANSLATE(@OghamSpaceMark, @OghamSpaceMark, 'a'),
       REPLACE(@Space + @OghamSpaceMark, @OghamSpaceMark, 'a'), 
       TRANSLATE(@Space + @OghamSpaceMark + @Space, @OghamSpaceMark, 'a'),
       dbo.ufnRemoveSpaceCharacters(@SpaceCharacters),
       REPLACE(@SpaceCharacters, @OghamSpaceMark, 'a')
Run Code Online (Sandbox Code Playgroud)

带输出

(无列名) (无列名) (无列名) (无列名) (无列名) (无列名)
一种 一种 一种 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  ???????????????

我还运行了以下内容以提供其他信息

SELECT os_language_version,
       SERVERPROPERTY('LCID') AS 'Instance-LCID',
       SERVERPROPERTY('Collation') AS 'Instance-Collation',
       SERVERPROPERTY('ComparisonStyle') AS 'Instance-ComparisonStyle',
       SERVERPROPERTY('SqlSortOrder') AS 'Instance-SqlSortOrder',
       SERVERPROPERTY('SqlSortOrderName') AS 'Instance-SqlSortOrderName',
       SERVERPROPERTY('SqlCharSet') AS 'Instance-SqlCharSet',
       SERVERPROPERTY('SqlCharSetName') AS 'Instance-SqlCharSetName',
       DATABASEPROPERTYEX(N'{database_name}', 'LCID') AS 'Database-LCID',
       DATABASEPROPERTYEX(N'{database_name}', 'Collation') AS 'Database-Collation',
       DATABASEPROPERTYEX(N'{database_name}', 'ComparisonStyle') AS 'Database-ComparisonStyle',
       DATABASEPROPERTYEX(N'{database_name}', 'SQLSortOrder') AS 'Database-SQLSortOrder'
FROM sys.dm_os_windows_info;
Run Code Online (Sandbox Code Playgroud)

带输出

os_language_version 实例-LCID 实例整理 实例比较风格 实例-SqlSortOrder 实例-SqlSortOrderName 实例-SqlCharSet 实例-SqlCharSetName 数据库-LCID 数据库整理 数据库比较风格 数据库-SQLSortOrder
1033 1033 SQL_Latin1_General_CP1_CI_AS 196609 52 nocase_iso 1 iso_1 空值 空值 空值 空值

小智 1

@JeroenMostert 回答了这个问题:

在函数的最后一行,在 @Value 之后添加 COLLATE Latin1_General_BIN2。欢迎来到排序规则如何处理字符簇的奇妙古怪世界;幸运的是,二进制排序规则只是按照它们的指示进行。

(不幸的是,不允许使用 COLLATE 子句来标记参数或输出类型,因为这可以说比必须弄清楚它在每个其他位置的确切位置要容易得多。)