如何在 SQL Server 中将数据转换为适当的大小写

SQL*_*ood 6 sql-server t-sql functions

SQL Server 包含用于查看/更新字符串数据为大写和小写但不正确大小写的系统函数。希望此操作发生在 SQL Server 中而不是在应用程序层中的原因有多种。就我而言,我们在整合来自多个来源的全球 HR 数据期间执行了一些数据清理。

如果您在 Internet 上搜索,您会发现此任务的多种解决方案,但许多解决方案似乎都有限制性警告或不允许在函数中定义异常。

注意:正如下面的评论中提到的,SQL Server 不是执行此转换的理想场所。还建议了其他方法 - 例如 CLR。在我看来,这篇文章已经达到了它的目的——将所有这些想法集中在一个地方,而不是随处可见的随机花絮。谢谢你们。

bil*_*nkc 13

使用这些方法将遇到的挑战是您丢失了信息。向业务用户解释,他们拍摄了一张模糊的、失焦的照片,尽管他们在电视上看到了这些,但无法使其清晰和清晰。总会有这些规则不起作用的情况,只要每个人都知道进入这种情况,那就去做吧。

这是 HR 数据,因此我将假设我们正在讨论以一致的标题格式格式获取姓名,因为大型机将其存储为AARON BERTRAND,我们希望新系统不会对他们大喊大叫。Aaron 很容易(但不便宜)。您和 Max 已经确定了 Mc/Mac 的问题,因此它正确地将 Mc/Mac 大写,但在某些情况下,它对 Mackey/ Maclin /Mackenzie过于激进。Mackenzie 是一个有趣的案例 - 看看它作为婴儿名字的流行程度

麦肯齐

在某个时候,会有一个名叫麦肯齐麦肯齐的可怜孩子,因为人是可怕的生物。

你还会遇到像 D'Antoni 这样可爱的东西,我们应该把两个字母都放在刻度线周围。除了 d'Autremont,您只将撇号后的字母大写。不过,如果您向 d'Illoni 发送邮件,因为他们的姓氏是 D'illoni,天堂会帮助您。

为了贡献实际代码,以下是我们在 2005 年的实例中出于我们的目的使用的 CLR 方法。除了我们构建的异常列表之外,它通常使用 ToTitleCase,这是我们基本上放弃尝试编纂上述异常的时候。

namespace Common.Util
{
    using System;
    using System.Collections.Generic;
    using System.Globalization;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Threading;

    /// <summary>
    /// A class that attempts to proper case a word, taking into
    /// consideration some outliers.
    /// </summary>
    public class ProperCase
    {
        /// <summary>
        /// Convert a string into its propercased equivalent.  General case
        /// it will capitalize the first letter of each word.  Handled special 
        /// cases include names with apostrophes (O'Shea), and Scottish/Irish
        /// surnames MacInnes, McDonalds.  Will fail for Macbeth, Macaroni, etc
        /// </summary>
        /// <param name="inputText">The data to be recased into initial caps</param>
        /// <returns>The input text resampled as proper cased</returns>
        public static string Case(string inputText)
        {
            CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
            TextInfo textInfo = cultureInfo.TextInfo;
            string output = null;
            int staticHack = 0;

            Regex expression = null;
            string matchPattern = string.Empty;

            // Should think about maybe matching the first non blank character
            matchPattern = @"
                (?<Apostrophe>'.\B)| # Match things like O'Shea so apostrophe plus one.  Think about white space between ' and next letter.  TODO:  Correct it's from becoming It'S, can't -> CaN'T
                \bMac(?<Mac>.) | # MacInnes, MacGyver, etc.  Will fail for Macbeth
                \bMc(?<Mc>.) # McDonalds
                ";
            expression = new Regex(matchPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

            // Handle our funky rules            
            // Using named matches is probably overkill as the
            // same rule applies to all but for future growth, I'm
            // defining it as such.
            // Quirky behaviour---for 2005, the compiler will 
            // make this into a static method which is verboten for 
            // safe assemblies.  
            MatchEvaluator upperCase = delegate(Match match)
            {
                // Based on advice from Chris Hedgate's blog
                // I need to reference a local variable to prevent
                // this from being turned into static
                staticHack = matchPattern.Length;

                if (!string.IsNullOrEmpty(match.Groups["Apostrophe"].Value))
                {
                    return match.Groups["Apostrophe"].Value.ToUpper();
                }

                if (!string.IsNullOrEmpty(match.Groups["Mac"].Value))
                {
                    return string.Format("Mac{0}", match.Groups["Mac"].Value.ToUpper());
                }

                if (!string.IsNullOrEmpty(match.Groups["Mc"].Value))
                {
                    return string.Format("Mc{0}", match.Groups["Mc"].Value.ToUpper());
                }

                return match.Value;
            };

            MatchEvaluator evaluator = new MatchEvaluator(upperCase);

            if (inputText != null)
            {
                // Generally, title casing converts the first character 
                // of a word to uppercase and the rest of the characters 
                // to lowercase. However, a word that is entirely uppercase, 
                // such as an acronym, is not converted.
                // http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase(VS.80).aspx
                string temporary = string.Empty;
                temporary = textInfo.ToTitleCase(inputText.ToString().ToLower());
                output = expression.Replace(temporary, evaluator);
            }
            else
            {
                output = string.Empty;
            }

            return output;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

现在一切都清楚了,我将完成 ee cummings 的这本可爱的诗集


Han*_*non 7

我意识到你已经有了一个很好的解决方案,但我想我会添加一个使用内联表值函数的更简单的解决方案,尽管它依赖于使用即将推出的“vNext”版本的 SQL Server,其中包括STRING_AGG()STRING_SPLIT()功能:

IF OBJECT_ID('dbo.fn_TitleCase') IS NOT NULL
DROP FUNCTION dbo.fn_TitleCase;
GO
CREATE FUNCTION dbo.fn_TitleCase
(
    @Input nvarchar(1000)
)
RETURNS TABLE
AS
RETURN
SELECT Item = STRING_AGG(splits.Word, ' ')
FROM (
    SELECT Word = UPPER(LEFT(value, 1)) + LOWER(RIGHT(value, LEN(value) - 1))
    FROM STRING_SPLIT(@Input, ' ')
    ) splits(Word);
GO
Run Code Online (Sandbox Code Playgroud)

测试功能:

SELECT *
FROM dbo.fn_TitleCase('this is a test');
Run Code Online (Sandbox Code Playgroud)

这是一个测试

SELECT *
FROM dbo.fn_TitleCase('THIS IS A TEST');
Run Code Online (Sandbox Code Playgroud)

这是一个测试

有关STRING_AGG()STRING_SPLIT() 的文档,请参阅 MSDN

请记住,该STRING_SPLIT()函数不保证以任何特定顺序返回项目。这可能是最烦人的。有一个 Microsoft 反馈项要求在 STRING_SPLIT 的输出中添加一列以表示输出的顺序。考虑在这里投票

如果你想生活在边缘,并想使用这种方法,它可以扩展到包括例外。我已经构建了一个内联表值函数,它就是这样做的:

CREATE FUNCTION dbo.fn_TitleCase
(
    @Input nvarchar(1000)
    , @SepList nvarchar(1)
)
RETURNS TABLE
AS
RETURN
WITH Exceptions AS (
    SELECT v.ItemToFind
        , v.Replacement
    FROM (VALUES /* add further exceptions to the list below */
          ('mca', 'McA')
        , ('maca','MacA')
        ) v(ItemToFind, Replacement)
)
, Source AS (
    SELECT Word = UPPER(LEFT(value, 1 )) + LOWER(RIGHT(value, LEN(value) - 1))
        , Num = ROW_NUMBER() OVER (ORDER BY GETDATE())
    FROM STRING_SPLIT(@Input, @SepList) 
)
SELECT Item = STRING_AGG(splits.Word, @SepList)
FROM (
    SELECT TOP 214748367 Word
    FROM (
        SELECT Word = REPLACE(Source.Word, Exceptions.ItemToFind, Exceptions.Replacement)
            , Source.Num
        FROM Source
        CROSS APPLY Exceptions
        WHERE Source.Word LIKE Exceptions.ItemToFind + '%'
        UNION ALL
        SELECT Word = Source.Word
            , Source.Num
        FROM Source
        WHERE NOT EXISTS (
            SELECT 1
            FROM Exceptions
            WHERE Source.Word LIKE Exceptions.ItemToFind + '%'
            )
        ) w
    ORDER BY Num
    ) splits;
GO
Run Code Online (Sandbox Code Playgroud)

测试这显示它是如何工作的:

SELECT *
FROM dbo.fn_TitleCase('THIS IS A TEST MCADAMS MACKENZIE MACADAMS', ' ');
Run Code Online (Sandbox Code Playgroud)

这是一个测试 McAdams Mackenzie MacAdams


SQL*_*ood 3

我遇到的最佳解决方案可以在这里找到。

我对脚本做了一些修改:我将 LTRIM 和 RTRIM 添加到返回值中,因为在某些情况下,脚本会在值后面添加空格。

预览从大写数据到正确大小写的转换的用法示例,但有例外:

SELECT <column>,[dbo].[fProperCase](<column>,'|APT|HWY|BOX|',NULL)
FROM <table> WHERE <column>=UPPER(<column>)
Run Code Online (Sandbox Code Playgroud)

该脚本真正简单但功能强大的方面是能够在函数调用本身中定义异常。

但请注意:
按照当前编写的脚本,无法正确处理 Mc[AZ]%、Mac[AZ]% 等姓氏。我目前正在进行编辑以处理这种情况。

作为解决方法,我更改了函数的返回参数: REPLACE(REPLACE(LTRIM(RTRIM((@ProperCaseText))),'Mcd','McD'),'Mci','McI') 等...

这种方法显然需要预先了解数据并且并不理想。我确信有一种方法可以解决这个问题,但我正处于转换过程中,目前没有时间专门解决这个棘手的问题。

这是代码:

CREATE FUNCTION [dbo].[fProperCase](@Value varchar(8000), @Exceptions varchar(8000),@UCASEWordLength tinyint)
returns varchar(8000)
as
/* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Function Purpose: To convert text to Proper Case.
Created By:             David Wiseman
Website:                http://www.wisesoft.co.uk
Created:                2005-10-03
Updated:                2006-06-22
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INPUTS:

@Value :                This is the text to be converted to Proper Case
@Exceptions:            A list of exceptions to the default Proper Case rules. e.g. |RAM|CPU|HDD|TFT|
                              Without exception list they would display as Ram, Cpu, Hdd and Tft
                              Note the use of the Pipe "|" symbol to separate exceptions.
                              (You can change the @sep variable to something else if you prefer)
@UCASEWordLength: You can specify that words less than a certain length are automatically displayed in UPPERCASE

USAGE1:

Convert text to ProperCase, without any exceptions

select dbo.fProperCase('THIS FUNCTION WAS CREATED BY DAVID WISEMAN',null,null)
>> This Function Was Created By David Wiseman

USAGE2:

Convert text to Proper Case, with exception for WiseSoft

select dbo.fProperCase('THIS FUNCTION WAS CREATED BY DAVID WISEMAN @ WISESOFT','|WiseSoft|',null)
>> This Function Was Created By David Wiseman @ WiseSoft

USAGE3:

Convert text to Proper Case and default words less than 3 chars to UPPERCASE

select dbo.fProperCase('SIMPSON, HJ',null,3)
>> Simpson, HJ

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */
begin
      declare @sep char(1) -- Seperator character for exceptions
      declare @i int -- counter
      declare @ProperCaseText varchar(5000) -- Used to build our Proper Case string for Function return
      declare @Word varchar(1000) -- Temporary storage for each word
      declare @IsWhiteSpace as bit -- Used to indicate whitespace character/start of new word
      declare @c char(1) -- Temp storage location for each character

      set @Word = ''
      set @i = 1
      set @IsWhiteSpace = 1
      set @ProperCaseText = ''
      set @sep = '|'

      -- Set default UPPERCASEWord Length
      if @UCASEWordLength is null set @UCASEWordLength = 1
      -- Convert user input to lower case (This function will UPPERCASE words as required)
      set @Value = LOWER(@Value)

      -- Loop while counter is less than text lenth (for each character in...)
      while (@i <= len(@Value)+1)
      begin

            -- Get the current character
            set @c = SUBSTRING(@Value,@i,1)

            -- If start of new word, UPPERCASE character
            if @IsWhiteSpace = 1 set @c = UPPER(@c)

            -- Check if character is white space/symbol (using ascii values)
            set @IsWhiteSpace = case when (ASCII(@c) between 48 and 58) then 0
                                          when (ASCII(@c) between 64 and 90) then 0
                                          when (ASCII(@c) between 96 and 123) then 0
                                          else 1 end

            if @IsWhiteSpace = 0
            begin
                  -- Append character to temp @Word variable if not whitespace
                  set @Word = @Word + @c
            end
            else
            begin
                  -- Character is white space/punctuation/symbol which marks the end of our current word.
                  -- If word length is less than or equal to the UPPERCASE word length, convert to upper case.
                  -- e.g. you can specify a @UCASEWordLength of 3 to automatically UPPERCASE all 3 letter words.
                  set @Word = case when len(@Word) <= @UCASEWordLength then UPPER(@Word) else @Word end

                  -- Check word against user exceptions list. If exception is found, use the case specified in the exception.
                  -- e.g. WiseSoft, RAM, CPU.
                  -- If word isn't in user exceptions list, check for "known" exceptions.
                  set @Word = case when charindex(@sep + @Word + @sep,@exceptions collate Latin1_General_CI_AS) > 0
                                    then substring(@exceptions,charindex(@sep + @Word + @sep,@exceptions collate Latin1_General_CI_AS)+1,len(@Word))
                                    when @Word = 's' and substring(@Value,@i-2,1) = '''' then 's' -- e.g. Who's
                                    when @Word = 't' and substring(@Value,@i-2,1) = '''' then 't' -- e.g. Don't
                                    when @Word = 'm' and substring(@Value,@i-2,1) = '''' then 'm' -- e.g. I'm
                                    when @Word = 'll' and substring(@Value,@i-3,1) = '''' then 'll' -- e.g. He'll
                                    when @Word = 've' and substring(@Value,@i-3,1) = '''' then 've' -- e.g. Could've
                                    else @Word end

                  -- Append the word to the @ProperCaseText along with the whitespace character
                  set @ProperCaseText = @ProperCaseText + @Word + @c
                  -- Reset the Temp @Word variable, ready for a new word
                  set @Word = ''
            end
            -- Increment the counter
            set @i = @i + 1
      end
      return @ProperCaseText
end
Run Code Online (Sandbox Code Playgroud)