LOG和EXP函数中的舍入问题

Pரத*_*ீப் 9 sql sql-server sql-server-2008 sql-server-2012

我正在尝试执行累积乘法.我正在尝试两种方法来做到这一点

样本数据:

DECLARE @TEST TABLE
  (
     PAR_COLUMN INT,
     PERIOD     INT,
     VALUE      NUMERIC(22, 6)
  ) 
INSERT INTO @TEST VALUES 
(1,601,10 ),
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600)
Run Code Online (Sandbox Code Playgroud)

注意:value列中 的数据永远不会是整数,值将包含小数部分.为了显示近似问题,我将示例值保持为整数.


方法1:EXP + LOG + SUM()结束(排序方式)

在这个方法中我使用EXP + LOG + SUM() Over(Order by)技术来找到累积乘法.在这种方法中,数值不准确; 结果中存在一些舍入和近似问题.

SELECT *,
       Exp(Sum(Log(Abs(NULLIF(VALUE, 0))))
             OVER(
               PARTITION BY PAR_COLUMN
               ORDER BY PERIOD)) AS CUM_MUL
FROM   @TEST;
Run Code Online (Sandbox Code Playgroud)

结果:

PAR_COLUMN  PERIOD  VALUE       CUM_MUL
----------  ------  ---------   ----------------
1           601     10.000000   10
1           602     20.000000   200             -- 10 * 20 = 200(correct)
1           603     30.000000   6000.00000000001 -- 200 * 30 = 6000.000000000 (not 6000.00000000001) incorrect
1           604     40.000000   240000
1           605     50.000000   12000000
1           606     60.000000   720000000.000001  -- 12000000 * 60 = 720000000.000000 (not 720000000.000001) incorrect
2           601     100.000000  100
2           602     200.000000  20000
2           603     300.000000  5999999.99999999 -- 20000.000000 *300.000000 = 6000000.000000 (not 5999999.99999999) incorrect
2           604     400.000000  2399999999.99999  
2           605     500.000000  1199999999999.99
2           606     600.000000  719999999999998
Run Code Online (Sandbox Code Playgroud)

方法2:Tradictional Multiplication(递归CTE)

该方法完美地工作而没有任何舍入或近似问题.

;WITH CTE
     AS (SELECT TOP 1 WITH TIES PAR_COLUMN,
                                PERIOD,
                                VALUE,
                                CUM_MUL = VALUE
         FROM   @TEST
         ORDER  BY PERIOD
         UNION ALL
         SELECT T.PAR_COLUMN,
                T.PERIOD,
                T.VALUE,
                Cast(T.VALUE * C.CUM_MUL AS NUMERIC(22, 6))
         FROM   CTE C
                INNER JOIN @TEST T
                        ON C.PAR_COLUMN = T.PAR_COLUMN
                           AND T.PERIOD = C.PERIOD + 1)
SELECT *
FROM   CTE 
ORDER BY PAR_COLUMN,PERIOD
Run Code Online (Sandbox Code Playgroud)

结果

PAR_COLUMN  PERIOD  VALUE       CUM_MUL
----------  ------  ---------   ----------------
1           601     10.000000   10.000000
1           602     20.000000   200.000000
1           603     30.000000   6000.000000
1           604     40.000000   240000.000000
1           605     50.000000   12000000.000000
1           606     60.000000   720000000.000000
2           601     100.000000  100.000000
2           602     200.000000  20000.000000
2           603     300.000000  6000000.000000
2           604     400.000000  2400000000.000000
2           605     500.000000  1200000000000.000000
2           606     600.000000  720000000000000.000000
Run Code Online (Sandbox Code Playgroud)

有谁能告诉我 为什么方法1中的值不准确怎么解决?我试图通过改变数据类型Float,并通过增加scalenumeric,但没有用.

我真的想使用比方法2快得多的方法1.

编辑:现在我知道近似的原因.谁能找到解决这个问题的方法?

Vla*_*nov 7

在纯T-SQL中LOG,EXP使用float类型(8字节)进行操作,该类型只有15-17位有效数字.如果总和足够大的值,即使最后的第15位数也会变得不准确.您的数据是numeric(22,6),因此15位有效数字是不够的.

POWER可以返回numeric具有潜在的更高的精度类型,但它是对我们用处不大,因为两者LOGLOG10只能返回float反正.

为了演示这个问题,我将更改示例中的类型numeric(15,0)并使用POWER而不是EXP:

DECLARE @TEST TABLE
  (
     PAR_COLUMN INT,
     PERIOD     INT,
     VALUE      NUMERIC(15, 0)
  );

INSERT INTO @TEST VALUES 
(1,601,10 ),
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600);

SELECT *,
    POWER(CAST(10 AS numeric(15,0)),
        Sum(LOG10(
            Abs(NULLIF(VALUE, 0))
            ))
        OVER(PARTITION BY PAR_COLUMN ORDER BY PERIOD)) AS Mul
FROM @TEST;
Run Code Online (Sandbox Code Playgroud)

结果

+------------+--------+-------+-----------------+
| PAR_COLUMN | PERIOD | VALUE |       Mul       |
+------------+--------+-------+-----------------+
|          1 |    601 |    10 |              10 |
|          1 |    602 |    20 |             200 |
|          1 |    603 |    30 |            6000 |
|          1 |    604 |    40 |          240000 |
|          1 |    605 |    50 |        12000000 |
|          1 |    606 |    60 |       720000000 |
|          2 |    601 |   100 |             100 |
|          2 |    602 |   200 |           20000 |
|          2 |    603 |   300 |         6000000 |
|          2 |    604 |   400 |      2400000000 |
|          2 |    605 |   500 |   1200000000000 |
|          2 |    606 |   600 | 720000000000001 |
+------------+--------+-------+-----------------+
Run Code Online (Sandbox Code Playgroud)

这里的每一步都失去了精确性.计算LOG失去精度,SUM失去精度,EXP/POWER失去精度.有了这些内置函数,我认为你无法做很多事情.


所以,答案是 - 使用CLR和C#decimal类型(不是double),它支持更高的精度(28-29个有效数字).您的原始SQL类型numeric(22,6)将适合它.你不需要这个技巧LOG/EXP.


哎呀.我试图制作一个计算产品的CLR聚合.它适用于我的测试,但仅作为一个简单的聚合,即

这有效:

SELECT T.PAR_COLUMN, [dbo].[Product](T.VALUE) AS P
FROM @TEST AS T
GROUP BY T.PAR_COLUMN;
Run Code Online (Sandbox Code Playgroud)

甚至OVER (PARTITION BY)工作:

SELECT *,
    [dbo].[Product](T.VALUE) 
    OVER (PARTITION BY PAR_COLUMN) AS P
FROM @TEST AS T;
Run Code Online (Sandbox Code Playgroud)

但是,运行产品使用OVER (PARTITION BY ... ORDER BY ...)不起作用(使用SQL Server 2014 Express 12.0.2000.8检查):

SELECT *,
    [dbo].[Product](T.VALUE) 
    OVER (PARTITION BY T.PAR_COLUMN ORDER BY T.PERIOD 
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUM_MUL
FROM @TEST AS T;
Run Code Online (Sandbox Code Playgroud)

关键字"ORDER"附近的语法不正确.

搜索发现此连接项目已关闭为"无法修复"和此问题.


C#代码:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace RunningProduct
{
    [Serializable]
    [SqlUserDefinedAggregate(
        Format.UserDefined,
        MaxByteSize = 17,
        IsInvariantToNulls = true,
        IsInvariantToDuplicates = false,
        IsInvariantToOrder = true,
        IsNullIfEmpty = true)]
    public struct Product : IBinarySerialize
    {
        private bool m_bIsNull; // 1 byte storage
        private decimal m_Product; // 16 bytes storage

        public void Init()
        {
            this.m_bIsNull = true;
            this.m_Product = 1;
        }

        public void Accumulate(
            [SqlFacet(Precision = 22, Scale = 6)] SqlDecimal ParamValue)
        {
            if (ParamValue.IsNull) return;

            this.m_bIsNull = false;
            this.m_Product *= ParamValue.Value;
        }

        public void Merge(Product other)
        {
            SqlDecimal otherValue = other.Terminate();
            this.Accumulate(otherValue);
        }

        [return: SqlFacet(Precision = 22, Scale = 6)]
        public SqlDecimal Terminate()
        {
            if (m_bIsNull)
            {
                return SqlDecimal.Null;
            }
            else
            {
                return m_Product;
            }
        }

        public void Read(BinaryReader r)
        {
            this.m_bIsNull = r.ReadBoolean();
            this.m_Product = r.ReadDecimal();
        }

        public void Write(BinaryWriter w)
        {
            w.Write(this.m_bIsNull);
            w.Write(this.m_Product);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

安装CLR程序集:

-- Turn advanced options on
EXEC sys.sp_configure @configname = 'show advanced options', @configvalue = 1 ;
GO
RECONFIGURE WITH OVERRIDE ;
GO
-- Enable CLR
EXEC sys.sp_configure @configname = 'clr enabled', @configvalue = 1 ;
GO
RECONFIGURE WITH OVERRIDE ;
GO

CREATE ASSEMBLY [RunningProduct]
AUTHORIZATION [dbo]
FROM 'C:\RunningProduct\RunningProduct.dll'
WITH PERMISSION_SET = SAFE;
GO

CREATE AGGREGATE [dbo].[Product](@ParamValue numeric(22,6))
RETURNS numeric(22,6)
EXTERNAL NAME [RunningProduct].[RunningProduct.Product];
GO
Run Code Online (Sandbox Code Playgroud)

这个问题详细讨论了运行SUM的计算,Paul White在他的回答中展示了如何编写一个有效计算运行SUM的CLR函数.这将是编写计算正在运行的Product的函数的良好开端.

请注意,他使用了不同的方法.Paul没有创建自定义聚合函数,而是创建了一个返回表的函数.该函数将原始数据读入内存并执行所有必需的计算.

通过使用您选择的编程语言在客户端实现这些计算,可能更容易实现所需的效果.只需阅读整个表格并在客户端上计算正在运行的产品.如果在服务器上计算的运行产品是更复杂的计算中的中间步骤,那么创建CLR功能是有意义的,这将进一步聚合数据.


想到的另一个想法.

查找第三方.NET数学库,提供LogExp功能,具有精度高.制作这些标量函数的CLR版本.然后使用EXP + LOG + SUM() Over (Order by)方法,即SUM是内置的T-SQL的功能,支持Over (Order by)ExpLog是返回不定制CLR函数float,但高精度decimal.

请注意,高精度计算也可能很慢.在查询中使用CLR标量函数也可能使其变慢.


dan*_*era 2

您可以为您的数据四舍五入到大倍数:

--720000000000000 must be multiple of 600

select
   round( 719999999999998/600,  0 ) * 600

--result: 720000000000000
Run Code Online (Sandbox Code Playgroud)

在 SQLFiddle 上测试它

create TABLE T 
  (
     PAR_COLUMN INT,
     PERIOD     INT,
     VALUE      NUMERIC(22, 6)
  ) 
INSERT INTO T VALUES 
(1,601,10.1 ),    --<--- I put decimals just to test!
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600)
Run Code Online (Sandbox Code Playgroud)

查询1

with T1 as (
SELECT *,
       Exp(Sum(Log(Abs(NULLIF(VALUE, 0))))
             OVER(
               PARTITION BY PAR_COLUMN
               ORDER BY PERIOD)) AS CUM_MUL,
       VALUE AS CUM_MAX1,
       LAG( VALUE , 1, 1.) 
             OVER(
               PARTITION BY PAR_COLUMN
               ORDER BY PERIOD ) AS CUM_MAX2,
       LAG( VALUE , 2, 1.) 
             OVER(
               PARTITION BY PAR_COLUMN
               ORDER BY PERIOD ) AS CUM_MAX3
FROM   T )
select PAR_COLUMN,  PERIOD,  VALUE, 
       ( round( ( CUM_MUL  / ( CUM_MAX1 * CUM_MAX2 * CUM_MAX3) ) ,6) 
         * 
         cast( ( 1000000 * CUM_MAX1 * CUM_MAX2 * CUM_MAX3) as bigint )
       ) / 1000000.
       as CUM_MUL
FROM T1
Run Code Online (Sandbox Code Playgroud)

结果

| PAR_COLUMN | PERIOD | VALUE |         CUM_MUL |
|------------|--------|-------|-----------------|
|          1 |    601 |  10.1 |            10.1 | --ok! because my data
|          1 |    602 |    20 |             202 |
|          1 |    603 |    30 |            6060 |
|          1 |    604 |    40 |          242400 |
|          1 |    605 |    50 |        12120000 |
|          1 |    606 |    60 |       727200000 |
|          2 |    601 |   100 |             100 |
|          2 |    602 |   200 |           20000 |
|          2 |    603 |   300 |         6000000 |
|          2 |    604 |   400 |      2400000000 |
|          2 |    605 |   500 |   1200000000000 |
|          2 |    606 |   600 | 720000000000000 |
Run Code Online (Sandbox Code Playgroud)

请注意,我 x1000000 不带小数