如何计算每组中第一行和最后一行字段值的差异

r.z*_*rei 6 sql-server aggregate t-sql group-by

我有这样结构的表:

+-------+------------------+
| Value |       Date       |
+-------+------------------+
|    10 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 |
|    15 | 10/10/2010 10:30 |
|    15 | 10/10/2010 10:45 |
|    17 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 |
|    22 | 10/10/2010 11:30 |
|    30 | 10/10/2010 11:45 |
+-------+------------------+
Run Code Online (Sandbox Code Playgroud)

目前我正在使用 group by 来获取 min、max、avg 来获取这样的每小时报告:

+-----+-----+-------+------------------+
| min | max |  avg  |       Date       |
+-----+-----+-------+------------------+
|  10 |  15 | 12.75 | 10/10/2010 10:00 |
|  17 |  30 | 21.75 | 10/10/2010 11:00 |
+-----+-----+-------+------------------+
Run Code Online (Sandbox Code Playgroud)

我如何计算每组中最后一行和第一行值的差异以生成如下内容:

+-----+-----+-------+------+------------------+
| min | max |  avg  | diff |       Date       |
+-----+-----+-------+------+------------------+
|  10 |  15 | 12.75 |    5 | 10/10/2010 10:00 |
|  17 |  30 | 21.75 |   13 | 10/10/2010 11:00 |
+-----+-----+-------+------+------------------+
Run Code Online (Sandbox Code Playgroud)

谢谢。

And*_*y M 13

您没有显示用于在没有diff. 我假设它是这样的:

SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, if Value is an int, like this, perhaps:
                      -- AVG(CAST(Value AS decimal(10,2))
  Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;
Run Code Online (Sandbox Code Playgroud)

此外,您没有解释firstlast 的含义。在这个答案,假设第一代表组中最早的(根据Date值),同样,最后的手段最新的组中

投入的一种方法diff可能是这样的:

首先,将另外两个聚合列minDatemaxDate, 添加到原始查询中:

SELECT
  min     = MIN(Value),
  max     = MAX(Value),
  avg     = AVG(Value),
  minDate = MIN(Date),
  maxDate = MAX(Date),
  Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;
Run Code Online (Sandbox Code Playgroud)

接着,加入聚合结果集回原始表上minDate和上maxDate(分别)以访问相应的ValueS:

SELECT
  g.min,
  g.max,
  g.avg,
  diff = last.Value - first.Value,
  g.Date
FROM (
  SELECT
    min     = MIN(Value),
    max     = MAX(Value),
    avg     = AVG(Value),
    minDate = MIN(Date),
    maxDate = MAX(Date),
    Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
  GROUP BY
    DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
) g
INNER JOIN atable first ON first.Date = g.minDate
INNER JOIN atable last  ON last .Date = g.maxDate
;
Run Code Online (Sandbox Code Playgroud)

请注意,以上假设Date值(至少那些恰好在其相应小时内出现在第一个或最后一个的值)是唯一的,或者您将在输出中的某些小时内获得不止一行。

另一种方法是,如果您使用的是 SQL Server 2005 或更高版本,则可以使用窗口聚合函数MIN() OVER (...)MAX() OVER (...)计算Value对应于minDate或 的s maxDate,然后再聚合所有结果,类似于您现在可能正在执行的操作。以下是我具体要说的:

WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
)
, firstlast AS (
  SELECT
    Value,
    Date,
    GroupDate,
    FirstValue = CASE Date WHEN MIN(Date) OVER (PARTITION BY GroupDate) THEN Value END,
    LastValue  = CASE Date WHEN MAX(Date) OVER (PARTITION BY GroupDate) THEN Value END
  FROM partitioned
)
SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, again, if Value is an int, cast it as a decimal or float
  diff = MAX(LastValue) - MIN(FirstValue),
  Date = GroupDate
FROM firstlast
GROUP BY
  GroupDate
;
Run Code Online (Sandbox Code Playgroud)

如您所见,第一个公用表表达式 (CTE)仅返回所有行并添加一个计算列GroupDate,该列随后用于分组/分区。所以它本质上只是为分组表达式分配一个名称,这样做是为了提高整个查询的可读性/可维护性,因为该列后来被多次引用。这是第一个 CTE 产生的结果:

+-------+------------------+------------------+
| Value |       Date       |    GroupDate     |
+-------+------------------+------------------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |
+-------+------------------+------------------+
Run Code Online (Sandbox Code Playgroud)

第二个 CTE 向上述结果添加了两列。它使用窗口聚合函数MIN() OVER ...MAX() OVER ...匹配Date,并且在匹配发生的地方,相应Value的在单独的列中返回,FirstValue或者LastValue

+-------+------------------+------------------+------------+-----------+
| Value |       Date       |    GroupDate     | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |         10 |      NULL |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |       NULL |        15 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |         17 |      NULL |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |       NULL |      NULL |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |       NULL |      NULL |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        30 |
+-------+------------------+------------------+------------+-----------+
Run Code Online (Sandbox Code Playgroud)

至此,一切准备就绪,可以进行最后的聚合了。的minmaxavg列将被聚集的与上文相同的,并且diff现在可以轻松地作为聚合来获得FirstValue从所述聚合中减去LastValue。从上面的结果集中可以看出,您可以使用各种函数来获取FirstValueLastValue用于组:它可以是MIN, MAX, SUM, AVG– 任何可以,因为每个组中只有一个值。

主要的选择,但是,正如你所看到的,特别适用MAX()LastValueMIN()超过FirstValue。那是故意的。这是因为第二个建议并不Date像第一个建议那样真正需要是唯一的,但是,如果minDatemaxDate碰巧有多个关联Value,它会导致FirstValueLastValue包含每个组多个值,例如这个:

+-------+------------------+------------------+------------+-----------+
| Value |       Date       |    GroupDate     | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
|     9 | 10/10/2010 10:00 | 10/10/2010 10:00 |          9 |      NULL |
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |         10 |      NULL |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |       NULL |        15 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |         17 |      NULL |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |       NULL |      NULL |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |       NULL |      NULL |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        30 |
|    33 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        33 |
+-------+------------------+------------------+------------+-----------+
Run Code Online (Sandbox Code Playgroud)

我认为在这种情况下,取最大的最后一个值和最小的第一个值之间的差异会更自然。但是,您应该更清楚在此处应用什么规则,因此您只需相应地更改查询即可。

您可以在 SQL Fiddle 测试这两种解决方案:


更新

从 SQL Server 2012 开始,您还可以使用FIRST_VALUELAST_VALUE函数并将它们替换为firstlast我上面最后一个查询中 CTE 中的 CASE 表达式,如下所示:

FirstValue = FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
LastValue  = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
Run Code Online (Sandbox Code Playgroud)

在这种情况下,它不会不管你是否使用MIN或MAX以上FirstValueLastValue更高版本(在主SELECT):每列将具有完全相同的值(第一个或最后一个Value相应)跨相同的所有行GroupDate组,所以MIN()MAX()将返回每种情况下的结果相同。

实际上,您可以diff直接在firstlastCTE 中获取,然后在主查询中,只需使用 MIN/MAX 聚合它或将其添加到 GROUP BY 并在不聚合的情况下引用它,如下所示:

WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
)
, firstlast AS (
  SELECT
    Value,
    Date,
    GroupDate,
    diff = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
         - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  FROM partitioned
)
SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),
  diff,
  Date = GroupDate
FROM firstlast
GROUP BY
  GroupDate,
  diff
;
Run Code Online (Sandbox Code Playgroud)

再进一步,您可以获得min,maxavgin firstlast,而不是主查询 - 使用对应的窗口函数:

min  = MIN(Value) OVER (PARTITION BY GroupDate),
max  = MAX(Value) OVER (PARTITION BY GroupDate),
avg  = AVG(Value) OVER (PARTITION BY GroupDate),
Run Code Online (Sandbox Code Playgroud)

通过这三个额外的列和之前的更改,firstlastCTE 将为您的示例返回如下所示的行集:

+-------+------------------+------------------+-----+-----+-------+------+
| Value |       Date       |    GroupDate     | min | max |  avg  | diff |
+-------+------------------+------------------+-----+-----+-------+------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
+-------+------------------+------------------+-----+-----+-------+------+
Run Code Online (Sandbox Code Playgroud)

请注意GroupDate, min, max,avgdiff– 您真正需要用于最终集合的列 –如何在属于同一组的所有行中简单地重复。这意味着您可以去掉Valueand Date,重命名GroupDateDate,稍微重新排列列,将 DISTINCT 应​​用于结果集——并且您已经消除了最后一个 SELECT:

WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM
    atable
)
SELECT DISTINCT
  min  = MIN(Value) OVER (PARTITION BY GroupDate),
  max  = MAX(Value) OVER (PARTITION BY GroupDate),
  avg  = AVG(Value) OVER (PARTITION BY GroupDate),
  diff = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
       - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
  Date = GroupDate
FROM
  partitioned
;
Run Code Online (Sandbox Code Playgroud)

最后,还可以将移动GroupDate计算成其中相同的范围minmax等等被计算。您可以为此使用 CROSS APPLY,从而避免完全嵌套查询的需要——换句话说,这样您也可以摆脱partitionedCTE。在整个查询应该是这样的:

SELECT DISTINCT
  min  = MIN(t.Value) OVER (PARTITION BY x.GroupDate),
  max  = MAX(t.Value) OVER (PARTITION BY x.GroupDate),
  avg  = AVG(t.Value) OVER (PARTITION BY x.GroupDate),
  diff = LAST_VALUE(t.Value)  OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
       - FIRST_VALUE(t.Value) OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
                                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
  Date = x.GroupDate
FROM
  atable AS t
  CROSS APPLY (SELECT DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)) AS x (GroupDate)
;
Run Code Online (Sandbox Code Playgroud)

并返回相同的结果。您也可以在 SQL Fiddle 上测试它。