CAST和IsNumeric

Mar*_*man 26 sql-server sql-server-2005

为什么以下查询会返回"将数据类型varchar转换为bigint时出错"?IsNumeric不能使CAST安全吗?我已经尝试了强制转换中的每个数值数据类型并获得相同的"错误转换..."错误.我不相信结果数字的大小是一个问题,因为溢出是一个不同的错误.

有趣的是,在管理工作室中,结果实际上会在错误返回之前显示在结果窗格中一瞬间.

SELECT CAST(myVarcharColumn AS bigint)  
FROM myTable  
WHERE IsNumeric(myVarcharColumn) = 1 AND myVarcharColumn IS NOT NULL  
GROUP BY myVarcharColumn
Run Code Online (Sandbox Code Playgroud)

有什么想法吗?

G M*_*ros 56

如果varchar值可以转换为任何数字类型,则IsNumeric返回1.这包括int,bigint,decimal,numeric,real和float.

科学记数法可能会导致您遇到问题.例如:

Declare @Temp Table(Data VarChar(20))

Insert Into @Temp Values(NULL)
Insert Into @Temp Values('1')
Insert Into @Temp Values('1e4')
Insert Into @Temp Values('Not a number')

Select Cast(Data as bigint)
From   @Temp
Where  IsNumeric(Data) = 1 And Data Is Not NULL
Run Code Online (Sandbox Code Playgroud)

有一个技巧可以用于IsNumeric,因此对于带有科学记数法的数字,它返回0.您可以应用类似的技巧来防止小数值.

IsNumeric(YourColumn +'e0')

IsNumeric(YourColumn +'.0e0')

试试看.

SELECT CAST(myVarcharColumn AS bigint)
FROM myTable
WHERE IsNumeric(myVarcharColumn + '.0e0') = 1 AND myVarcharColumn IS NOT NULL
GROUP BY myVarcharColumn
Run Code Online (Sandbox Code Playgroud)

  • 这仅处理 BigInt(这是问题所问的问题,但仅适用于这种数据类型)。警告:如果处理大于 BigInt 的数字,此代码将中断。对于空字符串,它还返回 0 而不是 NULL。此解决方案会剔除您可能还想转换的任何十进制值(如 1.0)。有关完整的解决方案,请参阅我的帖子:http://stackoverflow.com/a/21770230/555798 (2认同)
  • @MikeTeeVee 你说的都是对的。关于十进制数字......由于'.0e0'字符串,它们被删除。如果将其更改为“e0”,则十进制数将通过。这个解决方案的好处是,一旦你理解了“技巧”,它就会简单易行。此外,这可以很容易地转换为用户定义的函数。性能会非常好,因为它不需要访问表中的任何内容。 (2认同)

Mik*_*Vee 7

背景:

我使用第三方数据库,不断收到其他第三方供应商的新数据.
解析用于存储结果的可怕varchar字段是我的工作.
我们希望解析尽可能多的数据,此解决方案向您展示如何"清理"数据,以便不会忽略有效的条目.

  1. 一些结果是免费发短信的.
  2. 有些是枚举(是,否,蓝色,黑色等).
  3. 有些是整数.
  4. 其他人使用小数.
  5. 许多都是百分比,如果转换为整数,可能会在以后绊倒你.

如果我需要查询给定的小数范围(比如适用的-1.4到3.6),我的选项是有限的.
我更新了下面的查询,使用@GMastros建议附加'e0'.
谢谢@GMastros,这为我节省了额外的2行逻辑.

解:

--NOTE: I'd recommend you use this to convert your numbers and store them in a separate table (or field).
--      This way you may reuse them when when working with legacy/3rd-party systems, instead of running these calculations on the fly each time.
SELECT Result.Type, Result.Value, Parsed.CleanValue, Converted.Number[Number - Decimal(38,4)],
       (CASE WHEN Result.Value IN ('0', '1', 'True', 'False') THEN CAST(Result.Value as Bit) ELSE NULL END)[Bit],--Cannot convert 1.0 to Bit, it must be in Integer format already.
       (CASE WHEN Converted.Number BETWEEN 0 AND 255 THEN CAST(Converted.Number as TinyInt) ELSE NULL END)[TinyInt],
       (CASE WHEN Converted.Number BETWEEN -32768 AND 32767 AND Result.Value LIKE '%\%%' ESCAPE '\' THEN CAST(Converted.Number / 100.0 as Decimal(9,4)) ELSE NULL END)[Percent],
       (CASE WHEN Converted.Number BETWEEN -32768 AND 32767 THEN CAST(Converted.Number as SmallInt) ELSE NULL END)[SmallInt],
       (CASE WHEN Converted.Number BETWEEN -214748.3648 AND 214748.3647 THEN CAST(Converted.Number as SmallMoney) ELSE NULL END)[SmallMoney],
       (CASE WHEN Converted.Number BETWEEN -2147483648 AND 2147483647 THEN CAST(Converted.Number as Int) ELSE NULL END)[Int],
       (CASE WHEN Converted.Number BETWEEN -2147483648 AND 2147483647 THEN CAST(CAST(Converted.Number as Decimal(10)) as Int) ELSE NULL END)[RoundInt],--Round Up or Down instead of Truncate.
       (CASE WHEN Converted.Number BETWEEN -922337203685477.5808 AND 922337203685477.5807 THEN CAST(Converted.Number as Money) ELSE NULL END)[Money],
       (CASE WHEN Converted.Number BETWEEN -9223372036854775808 AND 9223372036854775807 THEN CAST(Converted.Number as BigInt) ELSE NULL END)[BigInt],
       (CASE WHEN Parsed.CleanValue IN ('1', 'True', 'Yes', 'Y', 'Positive', 'Normal')   THEN CAST(1 as Bit)
             WHEN Parsed.CleanValue IN ('0', 'False', 'No', 'N', 'Negative', 'Abnormal') THEN CAST(0 as Bit) ELSE NULL END)[Enum],
       --I couln't use just Parsed.CleanValue LIKE '%e%' here because that would match on "True" and "Negative", so I also had to match on only allowable characters. - 02/13/2014 - MCR.
       (CASE WHEN ISNUMERIC(Parsed.CleanValue) = 1 AND Parsed.CleanValue LIKE '%e%' THEN Parsed.CleanValue ELSE NULL END)[Exponent]
  FROM
  (
    VALUES ('Null', NULL), ('EmptyString', ''), ('Spaces', ' - 2 . 8 % '),--Tabs and spaces mess up IsNumeric().
           ('Bit', '0'), ('TinyInt', '123'), ('Int', '123456789'), ('BigInt', '1234567890123456'),
           --('VeryLong', '12345678901234567890.1234567890'),
           ('VeryBig', '-1234567890123456789012345678901234.5678'),
           ('TooBig',  '-12345678901234567890123456789012345678.'),--34 (38-4) is the Longest length of an Integer supported by this query.
           ('VeryLong', '-1.2345678901234567890123456789012345678'),
           ('TooLong', '-12345678901234567890.1234567890123456789'),--38 Digits is the Longest length of a Number supported by the Decimal data type.
           ('VeryLong', '000000000000000000000000000000000000001.0000000000000000000000000000000000000'),--Works because Casting ignores leading zeroes.
           ('TooLong', '.000000000000000000000000000000000000000'),--Exceeds the 38 Digit limit for all Decimal types after the decimal-point.
           --Dot(.), Plus(+), Minus(-), Comma(,), DollarSign($), BackSlash(\), Tab(0x09), and Letter-E(e) all yeild false-posotives with IsNumeric().
           ('Decimal', '.'), ('Decimal', '.0'), ('Decimal', '3.99'),
           ('Positive', '+'), ('Positive', '+20'),
           ('Negative', '-'), ('Negative', '-45'), ('Negative', '- 1.23'),
           ('Comma', ','), ('Comma', '1,000'),
           ('Money', '$'), ('Money', '$10'),
           ('Percent', '%'), ('Percent', '110%'),--IsNumeric will kick out Percent(%) signs.
           ('BkSlash', '\'), ('Tab', CHAR(0x09)),--I've actually seen tab characters in our data.
           ('Exponent', 'e0'), ('Exponent', '100e-999'),--No SQL-Server datatype could hold this number, though it is real.
           ('Enum', 'True'), ('Enum', 'Negative')
  ) AS Result(Type, Value)--O is for Observation.
  CROSS APPLY
  ( --This Step is Optional.  If you have Very Long numbers with tons of leading zeros, then this is useful.  Otherwise is overkill if all the numbers you want have 38 or less digits.
    --Casting of trailing zeros count towards the max 38 digits Decimal can handle, yet Cast ignores leading-zeros.  This also cleans up leading/trailing spaces. - 02/25/2014 - MCR.
    SELECT LTRIM(RTRIM(SUBSTRING(Result.Value, PATINDEX('%[^0]%', Result.Value + '.'), LEN(Result.Value))))[Value]
  ) AS Trimmed
  CROSS APPLY
  (
    SELECT --You will need to filter out other Non-Keyboard ASCII characters (before Space(0x20) and after Lower-Case-z(0x7A)) if you still want them to be Cast as Numbers. - 02/15/2014 - MCR.
           REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Trimmed.Value,--LTRIM(RTRIM(Result.Value)),
           (CHAR(0x0D) + CHAR(0x0A)), ''),--Believe it or not, we have people that press carriage return after entering in the value.
           CHAR(0x09), ''),--Apparently, as people tab through controls on a page, some of them inadvertently entered Tab's for values.
           ' ', ''),--By replacing spaces for values (like '- 2' to work), you open the door to values like '00 12 3' - your choice.
           '$', ''), ',', ''), '+', ''), '%', ''), '/', '')[CleanValue]
  ) AS Parsed--P is for Parsed.
  CROSS APPLY
  ( --NOTE: I do not like my Cross-Applies to feed into each other.
    --      I'm paranoid it might affect performance, but you may move this into the select above if you like. - 02/13/2014 - MCR.
    SELECT (CASE WHEN ISNUMERIC(Parsed.CleanValue + 'e0') = 1--By concatenating 'e0', I do not need to check for: Parsed.CleanValue NOT LIKE '%e%' AND Parsed.CleanValue NOT IN ('.', '-')
                 --  If you never plan to work with big numbers, then could use Decimal(19,4) would be best as it only uses 9 storage bytes compared to the 17 bytes that 38 precision requires.
                 --  This might help with performance, especially when converting a lot of data.
                  AND CHARINDEX('.', REPLACE(Parsed.CleanValue, '-', '')) - 1    <= (38-4)--This is the Longest Integer supported by Decimal(38,4)).
                  AND LEN(REPLACE(REPLACE(Parsed.CleanValue, '-', ''), '.', '')) <= 38--When casting to a Decimal (of any Precision) you cannot exceed 38 Digits. - 02/13/2014 - MCR.
                 THEN CAST(Parsed.CleanValue as Decimal(38,4))--Scale of 4 used is the max that Money has.  This is the biggest number SQL Server can hold.
                 ELSE NULL END)[Number]
  ) AS Converted--C is for Converted.
Run Code Online (Sandbox Code Playgroud)

输出:

下面的屏幕截图已经过格式化和缩减以适应StackOverflow.
实际结果有更多列. MikeTeeVee的IsNumeric Casting

研究:

每个查询旁边都是结果.
有趣的是看到IsNumeric的缺点以及CASTing的局限性.
我展示了这一点,所以你可能会看到编写上述查询的背景研究.
了解每个设计决策非常重要(如果您考虑削减任何设计).

SELECT ISNUMERIC('')--0.  This is understandable, but your logic may want to default these to zero.
SELECT ISNUMERIC(' ')--0.  This is understandable, but your logic may want to default these to zero.
SELECT ISNUMERIC('%')--0.
SELECT ISNUMERIC('1%')--0.
SELECT ISNUMERIC('e')--0.
SELECT ISNUMERIC('  ')--1.  --Tab.
SELECT ISNUMERIC(CHAR(0x09))--1.  --Tab.
SELECT ISNUMERIC(',')--1.
SELECT ISNUMERIC('.')--1.
SELECT ISNUMERIC('-')--1.
SELECT ISNUMERIC('+')--1.
SELECT ISNUMERIC('$')--1.
SELECT ISNUMERIC('\')--1.  '
SELECT ISNUMERIC('e0')--1.
SELECT ISNUMERIC('100e-999')--1.  No SQL-Server datatype could hold this number, though it is real.
SELECT ISNUMERIC('3000000000')--1.  This is bigger than what an Int could hold, so code for these too.
SELECT ISNUMERIC('1234567890123456789012345678901234567890')--1.  Note: This is larger than what the biggest Decimal(38) can hold.
SELECT ISNUMERIC('- 1')--1.
SELECT ISNUMERIC('  1  ')--1.
SELECT ISNUMERIC('True')--0.
SELECT ISNUMERIC('1/2')--0.  No love for fractions.

SELECT CAST('e0'  as Int)--0.  Surpise!  Casting to Decimal errors, but for Int is gives us zero, which is wrong.
SELECT CAST('0e0'  as Int)--0.  Surpise!  Casting to Decimal errors, but for Int is gives us zero, which is wrong.
SELECT CAST(CHAR(0x09) as Decimal(12,2))--Error converting data type varchar to numeric.  --Tab.
SELECT CAST('   1' as Decimal(12,2))--Error converting data type varchar to numeric.  --Tab.
SELECT CAST(REPLACE('   1', CHAR(0x09), '') as Decimal(12,2))--Error converting data type varchar to numeric.  --Tab.
SELECT CAST(''  as Decimal(12,2))--Error converting data type varchar to numeric.
SELECT CAST(''  as Int)--0.  Surpise!  Casting to Decimal errors, but for Int is gives us zero, which is wrong.
SELECT CAST(',' as Decimal(12,2))--Error converting data type varchar to numeric.
SELECT CAST('.' as Decimal(12,2))--Error converting data type varchar to numeric.
SELECT CAST('-' as Decimal(12,2))--Arithmetic overflow error converting varchar to data type numeric.
SELECT CAST('+' as Decimal(12,2))--Arithmetic overflow error converting varchar to data type numeric.
SELECT CAST('$' as Decimal(12,2))--Error converting data type varchar to numeric.
SELECT CAST('$1' as Decimal(12,2))--Error converting data type varchar to numeric.
SELECT CAST('1,000' as Decimal(12,2))--Error converting data type varchar to numeric.
SELECT CAST('- 1'   as Decimal(12,2))--Error converting data type varchar to numeric.  (Due to spaces).
SELECT CAST('  1  ' as Decimal(12,2))--1.00  Leading and trailing spaces are okay.
SELECT CAST('1.' as Decimal(12,2))--1.00
SELECT CAST('.1' as Decimal(12,2))--0.10
SELECT CAST('-1' as Decimal(12,2))--1.00
SELECT CAST('+1' as Decimal(12,2))--1.00
SELECT CAST('True'  as Bit)--1
SELECT CAST('False' as Bit)--0
--Proof: The Casting to Decimal cannot exceed 38 Digits, even if the precision is well below 38.
SELECT CAST('1234.5678901234567890123456789012345678' as Decimal(8,4))--1234.5679
SELECT CAST('1234.56789012345678901234567890123456789' as Decimal(8,4))--Arithmetic overflow error converting varchar to data type numeric.

--Proof: Casting of trailing zeros count towards the max 38 digits Decimal can handle, yet it ignores leading-zeros.
SELECT CAST('.00000000000000000000000000000000000000' as Decimal(8,4))--0.0000  --38 Digits after the decimal point.
SELECT CAST('000.00000000000000000000000000000000000000' as Decimal(8,4))--0.0000  --38 Digits after the decimal point and 3 zeros before the decimal point.
SELECT CAST('.000000000000000000000000000000000000000' as Decimal(8,4))--Arithmetic overflow error converting varchar to data type numeric.  --39 Digits after the decimal point.
SELECT CAST('1.00000000000000000000000000000000000000' as Decimal(8,4))--Arithmetic overflow error converting varchar to data type numeric.  --38 Digits after the decimal point and 1 non-zero before the decimal point.
SELECT CAST('000000000000000000000000000000000000001.0000000000000000000000000000000000000' as Decimal(8,4))--1.0000

--Caveats: When casting to an Integer:
SELECT CAST('3.0' as Int)--Conversion failed when converting the varchar value '3.0' to data type int.
--NOTE: When converting from character data to Int, you may want to do a double-conversion like so (if you want to Round your results first):
SELECT CAST(CAST('3.5'  as Decimal(10))   as Int)--4.  Decimal(10) has no decimal precision, so it rounds it to 4 for us BEFORE converting to an Int.
SELECT CAST(CAST('3.5'  as Decimal(11,1)) as Int)--3.  Decimal (11,1) HAS decimal precision, so it stays 3.5 before converting to an Int, which then truncates it.
--These are the best ways to go if you simply want to Truncate or Round.
SELECT CAST(CAST('3.99' as Decimal(10)) as Int)--3.  Good Example of Rounding.
SELECT CAST(FLOOR('3.99') as Int)--3.  Good Example fo Truncating.
Run Code Online (Sandbox Code Playgroud)


Kev*_*ild 5

试试这个,看看你是否仍然收到错误...

SELECT CAST(CASE 
            WHEN IsNumeric(myVarcharColumn) = 0
                THEN 0
            ELSE myVarcharColumn
            END AS BIGINT)
FROM myTable
WHERE IsNumeric(myVarcharColumn) = 1
    AND myVarcharColumn IS NOT NULL
GROUP BY myVarcharColumn
Run Code Online (Sandbox Code Playgroud)


HLG*_*GEM 5

最好的解决方案是停止在 varchar 列中存储整数。很明显,存在一个数据问题,即数据可以解释为数字,但不能这样转换。您需要找到存在问题的记录并修复它们,如果数据是可以并且应该修复的。根据您存储的内容以及开始时它是 varchar 的原因,您可能需要修复查询而不是数据。但是,如果您首先找到破坏当前查询的记录,那么这样做也会更容易。

如何做到这一点是个问题。使用 charindex 在数据中搜索小数位以查看是否有小数(除了 .0 会转换)相对容易。您还可以查找任何包含 e 或 $ 或任何其他字符的记录,这些字符可以根据已经给出的来源作为数字进行解释。如果您没有大量记录,对数据进行快速视觉扫描可能会找到它,特别是如果您首先对该字段进行排序。

有时,当我一直在寻找破坏查询的坏数据时,我将数据放入临时表中,然后尝试分批处理(使用插值),直到找到它破坏的数据。从前 1000 条开始(不要忘记使用 order by,否则删除好的记录时不会得到相同的结果,如果有数百万条记录以较大的数字开头,则 1000 只是一个最佳猜测)。如果通过,删除这 1000 条记录并选择下一批。一旦失败,请选择较小的批次。一旦您找到一个可以轻松进行视觉扫描的数字,您就会发现问题所在。当我有数百万条记录和一个奇怪的错误时,我已经能够相当快地找到问题记录,没有任何查询我

  • 很多时候你不能对源数据的形状做很多事情。 (5认同)
  • @harvest316,这就是我们不导入未正确清理的数据的原因。我们当然不会使用错误的数据类型将其存储在我们的系统中。我们从数千个文件中导入数据,并且在我们自己的数据中没有这些问题,因为我们在将它们堆积到我们自己的数据库之前会寻找这些东西。 (2认同)
  • 有些情况下无法更改表结构。这也是仓储中的一种常见模式,即从源(可能使用 iffy 数据类型或作为平面文件)获取数据的初始 * 未更改 * 副本,然后在将数据从暂存区移动时清理数据。处理这些问题的查询非常有用。但即使脏数据是意外的并且是设计不佳的结果,看看@MikeTeeVee 的回应;如果您了解 ISNUMERIC 的工作原理,那么您应该可以通过查询找到违规行,而不是花时间手动观察数据。 (2认同)