过滤字符以仅返回字符串中的数字字符?

use*_*599 6 sql-server sql-server-2008-r2 string-manipulation

我想过滤一个nvarchar字段以仅返回数值。

我有一些 SQL 可以做到这一点,但它似乎比它需要的要复杂得多。我有兴趣找出是否有人有更好的方法来过滤掉字符串中的任何非数字字符?

IF OBJECT_ID('tempdb..#MOB') IS NOT NULL
BEGIN
    DROP Table #MOB
END

SELECT [mob]
INTO #MOB
FROM (
SELECT '(00) 1234 5678' AS [mob]
UNION
SELECT '1234 5678' AS [mob]
UNION
SELECT '+61 012 345 678' AS [mob]
) AS temp


;WITH [fill] ([Num], [Index], [MOBILEPHONE])
AS
(
    SELECT 
    CASE 
        WHEN [MOBILEPHONE] IS NOT NULL
        THEN SUBSTRING([MOBILEPHONE], 1, 1) 
        ELSE NULL 
    END AS [Num]
    , 1 AS [INDEX], [MOBILEPHONE]
    FROM (
        SELECT DISTINCT [mob] AS [MOBILEPHONE]
        FROM #MOB as t
    ) AS temp
    UNION ALL
    SELECT 
    SUBSTRING([F].[MOBILEPHONE], [F].[Index] + 1, 1) AS [Num]
    ,[F].[Index] + 1 AS [Index]
    , [MOBILEPHONE]
    FROM [fill] AS [F]
    WHERE ([F].[Index] + 1) < LEN([F].[MOBILEPHONE]) + 1
)

SELECT [E].[MOBILEPHONE] AS [old_MOBILEPHONE],
    STUFF((SELECT N'' + [F].[Num]
    FROM [fill] AS [F]
    WHERE (PATINDEX('%[^0-9]%', [F].[Num]) = 0 OR PATINDEX('%[^0-9]%', [F].[Num]) IS NULL) AND
    ([F].[MOBILEPHONE] = [E].[MOBILEPHONE])
    ORDER BY [F].[MOBILEPHONE], [F].[Index]
    FOR XML PATH('')), 1, 0, '')
    AS [MOBILEPHONE]
FROM (
        SELECT DISTINCT [t].[MOBILEPHONE]
        FROM (SELECT [mob] AS [MOBILEPHONE] FROM #MOB) as t
    ) AS [E]
Run Code Online (Sandbox Code Playgroud)

输出

IF OBJECT_ID('tempdb..#MOB') IS NOT NULL
BEGIN
    DROP Table #MOB
END

SELECT [mob]
INTO #MOB
FROM (
SELECT '(00) 1234 5678' AS [mob]
UNION
SELECT '1234 5678' AS [mob]
UNION
SELECT '+61 012 345 678' AS [mob]
) AS temp


;WITH [fill] ([Num], [Index], [MOBILEPHONE])
AS
(
    SELECT 
    CASE 
        WHEN [MOBILEPHONE] IS NOT NULL
        THEN SUBSTRING([MOBILEPHONE], 1, 1) 
        ELSE NULL 
    END AS [Num]
    , 1 AS [INDEX], [MOBILEPHONE]
    FROM (
        SELECT DISTINCT [mob] AS [MOBILEPHONE]
        FROM #MOB as t
    ) AS temp
    UNION ALL
    SELECT 
    SUBSTRING([F].[MOBILEPHONE], [F].[Index] + 1, 1) AS [Num]
    ,[F].[Index] + 1 AS [Index]
    , [MOBILEPHONE]
    FROM [fill] AS [F]
    WHERE ([F].[Index] + 1) < LEN([F].[MOBILEPHONE]) + 1
)

SELECT [E].[MOBILEPHONE] AS [old_MOBILEPHONE],
    STUFF((SELECT N'' + [F].[Num]
    FROM [fill] AS [F]
    WHERE (PATINDEX('%[^0-9]%', [F].[Num]) = 0 OR PATINDEX('%[^0-9]%', [F].[Num]) IS NULL) AND
    ([F].[MOBILEPHONE] = [E].[MOBILEPHONE])
    ORDER BY [F].[MOBILEPHONE], [F].[Index]
    FOR XML PATH('')), 1, 0, '')
    AS [MOBILEPHONE]
FROM (
        SELECT DISTINCT [t].[MOBILEPHONE]
        FROM (SELECT [mob] AS [MOBILEPHONE] FROM #MOB) as t
    ) AS [E]
Run Code Online (Sandbox Code Playgroud)

我已经看到 Q & A T-SQL 选择查询以删除Stack Overflow 上的非数字字符,但该答案与我发现的解决方案类似,使用 CTE 表和递归。我正在寻找更简单的东西。希望有我可以创建的自定义排序规则或我可以应用的正则表达式过滤器之类的东西吗?

Pau*_*ite 8

要使用正则表达式,您需要使用 SQLCLR 函数。Solomon Rutzky创建了一个名为SQLsharp的有用 CLR 函数库。免费版包括几个正则表达式函数,包括RegEx_Replace4k如下使用:

SELECT 
    M.mob,
    numeric_only = 
        SQL#.RegEx_Replace4k
        (
            M.mob,  -- Source
            N'\D',  -- Regular expression
            N'',     -- Replace matches with empty string
            -1,     -- Unlimited replacements
            1,      -- Start at character position
            NULL    -- Options (see documentation)
        )
FROM #MOB AS M;
Run Code Online (Sandbox Code Playgroud)

这会产生如下所示的输出:

SELECT 
    M.mob,
    numeric_only = 
        SQL#.RegEx_Replace4k
        (
            M.mob,  -- Source
            N'\D',  -- Regular expression
            N'',     -- Replace matches with empty string
            -1,     -- Unlimited replacements
            1,      -- Start at character position
            NULL    -- Options (see documentation)
        )
FROM #MOB AS M;
Run Code Online (Sandbox Code Playgroud)

对于这个简单的要求,正则表达式有点矫枉过正,因此编写您自己的 CLR 实现以仅删除非数字可能会更快。尽管如此,我发现上面的库函数与最好的 T-SQL 实现一样快,如果不是更快的话(T-SQL 字符串操作相当慢)。

有关 T-SQL 实现,请参阅Dwain Camps根据模式拆分字符串


Jef*_*den 5

下面的“仅数字”函数是由 Eirikur Eiriksson 创建的,它在性能方面很好地打破了大多数 T-SQL-Only 解决方案的大门(100 万次随机字符串的转换,长度从 36 到 72 个字符不等,只需一点点超过 15 秒)。有关测试的其他信息可以在 SQLServerCentral.com 上的以下线程中找到。 http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360

CREATE FUNCTION dbo.DigitsOnlyEE
--Created by Eirikur Eiriksson (29 Oct 2014)
        (@pString VARCHAR(8000)) 
RETURNS TABLE WITH SCHEMABINDING AS RETURN
   WITH  E1(N)    AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
        ,Tally(N) AS (SELECT TOP (LEN(@pString)) (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) AS Num FROM E1 a,E1 b,E1 c,E1 d ORDER BY Num) 
 SELECT DigitsOnly = 
(
 SELECT SUBSTRING(@pString,N,1)
   FROM Tally 
  WHERE ((ASCII(SUBSTRING(@pString,N,1)) - 48) & 0x7FFF) < 10 
  ORDER BY N
    FOR XML PATH('')
)
;
Run Code Online (Sandbox Code Playgroud)

它是一个 iTVF(内联表值函数),因此您必须在 FROM 子句中而不是在 SELECT 列表中使用它,如下所示。

 SELECT ca.DigitsOnly
   FROM dbo.SomeTable
CROSS APPLY dbo.DigitsOnlyEE(SomeString) ca
;
Run Code Online (Sandbox Code Playgroud)

我什至不会加载 SQLSharp 进行测试,因为它似乎安装了证书和用户,因此可能永远不会在生产中使用它。即使我愿意,我也无法进行性能测试,因为作者在他的 EULA 中有以下限制,这也使我无法安装它。我不同意这样的限制,但我会尊重它们。

2.2. 限制 除了第 2.1 节中具体列举的权利外,被许可方及其附属公司和最终用户均不得拥有与软件相关的任何其他权利。作为说明而非限制,上述许可不提供任何权利:

2.2.8. 将软件或文档用于软件的竞争分析、竞争软件产品或服务的开发,或任何其他不利于许可方商业利益的目的。

  • 引起我注意的是串联部分缺少的 order by 。我不认为 xquery 执行计划表现良好。Paul 引起了我对 top 的排序,并由 Itzik 在这篇博客中确认了在他的查询中使用 top 生成数字表的情况。http://m.sqlmag.com/sql-server/virtual-auxiliary-table-numbers (2认同)
  • 奇怪的是,每当我使用物理 Tally 表时,实际上我都会使用 Order By,因为它“没有成本”(正如 Paul 所说)并且不会出现在执行计划中。我在那里使用它是因为我确实看到它误入歧途一次(但不记得当时的情况)。话虽如此,我绝对同意保证未来升级问题、不稳定连接等是值得的。我仍然有兴趣看看是否有人真的遇到过“乱序”问题,除了 TOP 在错误的地方或当 Row_Number() 限制被错误地放置在 WHERE 子句中时。 (2认同)

Sco*_*red 2

看看这是否适合您 - 这对我们来说已经成功了。

CREATE FUNCTION [dbo].[RemoveAlphaCharacters] (@Temp NVARCHAR(1000))
RETURNS NVARCHAR(1000)
AS
BEGIN
    DECLARE @KeepValues AS NVARCHAR(50)

    SET @KeepValues = '%[^0-9]%'

    WHILE PatIndex(@KeepValues, @Temp) > 0
        SET @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    RETURN @Temp
END
Run Code Online (Sandbox Code Playgroud)