SQL OVER()子句 - 何时以及为何有用?

Wit*_*ors 163 mysql sql sql-server clause aggregate-functions

    USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail 
WHERE SalesOrderID IN(43659,43664);
Run Code Online (Sandbox Code Playgroud)

我读到了那个条款,我不明白为什么需要它.这个功能Over有什么作用?怎么Partitioning By办?为什么我不能写一个查询Group By SalesOrderID

And*_*y M 138

可以GROUP BY SalesOrderID.区别在于,对于GROUP BY,您只能拥有GROUP BY中未包含的列的聚合值.

相反,使用窗口聚合函数而不是GROUP BY,您可以检索聚合和非聚合值.也就是说,虽然您在示例查询中没有这样做,但您可以检索单个OrderQty值及其相同SalesOrderIDs的组的总和,计数,平均值等.

这是一个实例,说明为什么窗口聚合很好.假设您需要计算每个值的总百分比.如果没有窗口化聚合,您必须首先派生聚合值列表,然后将其连接回原始行集,即如下所示:

SELECT
  orig.[Partition],
  orig.Value,
  orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
  INNER JOIN (
    SELECT
      [Partition],
      SUM(Value) AS TotalValue
    FROM OriginalRowset
    GROUP BY [Partition]
  ) agg ON orig.[Partition] = agg.[Partition]
Run Code Online (Sandbox Code Playgroud)

现在看看如何使用窗口聚合来做同样的事情:

SELECT
  [Partition],
  Value,
  Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig
Run Code Online (Sandbox Code Playgroud)

更简单,更清洁,不是吗?


gbn*_*gbn 67

OVER条款是强大的,你可以有在不同的范围("开窗")聚集,无论你使用GROUP BY与否

示例:获取SalesOrderID所有计数和所有计数

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) AS 'Count'
    ,COUNT(*) OVER () AS 'CountAll'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)
GROUP BY
     SalesOrderID, ProductID, OrderQty
Run Code Online (Sandbox Code Playgroud)

得到不同COUNT的,没有GROUP BY

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder'
    ,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct',
    ,COUNT(*) OVER () AS 'CountAllAgain'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)
Run Code Online (Sandbox Code Playgroud)


Tom*_*m H 44

如果您只想对SalesOrderID进行GROUP BY,那么您将无法在SELECT子句中包含ProductID和OrderQty列.

PARTITION BY子句让你分解你的聚合函数.一个明显而有用的例子是,如果您想为订单上的订单行生成行号:

SELECT
    O.order_id,
    O.order_date,
    ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no,
    OL.product_id
FROM
    Orders O
INNER JOIN Order_Lines OL ON OL.order_id = O.order_id
Run Code Online (Sandbox Code Playgroud)

(我的语法可能略有偏差)

然后你会得到类似的东西:

order_id    order_date    line_item_no    product_id
--------    ----------    ------------    ----------
    1       2011-05-02         1              5
    1       2011-05-02         2              4
    1       2011-05-02         3              7
    2       2011-05-12         1              8
    2       2011-05-12         2              1
Run Code Online (Sandbox Code Playgroud)


San*_*ngh 41

让我用一个例子来解释,你就能看到它是如何工作的.

假设你有下表DIM_EQUIPMENT:

VIN         MAKE    MODEL   YEAR    COLOR
-----------------------------------------
1234ASDF    Ford    Taurus  2008    White
1234JKLM    Chevy   Truck   2005    Green
5678ASDF    Ford    Mustang 2008    Yellow
Run Code Online (Sandbox Code Playgroud)

在SQL下运行

SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR ,
  COUNT(*) OVER (PARTITION BY YEAR) AS COUNT2
FROM DIM_EQUIPMENT
Run Code Online (Sandbox Code Playgroud)

结果如下

VIN         MAKE    MODEL   YEAR    COLOR     COUNT2
 ----------------------------------------------  
1234JKLM    Chevy   Truck   2005    Green     1
5678ASDF    Ford    Mustang 2008    Yellow    2
1234ASDF    Ford    Taurus  2008    White     2
Run Code Online (Sandbox Code Playgroud)

看看发生了什么.

你可以在没有Group By的情况下计算年份并与ROW匹配.

另一个有趣的方法,如果使用WITH子句得到相同的结果,WITH作为内联VIEW工作,可以简化查询,尤其是复杂的查询,但这不是这里的情况,因为我只是想显示用法

 WITH EQ AS
  ( SELECT YEAR AS YEAR2, COUNT(*) AS COUNT2 FROM DIM_EQUIPMENT GROUP BY YEAR
  )
SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR,
  COUNT2
FROM DIM_EQUIPMENT,
  EQ
WHERE EQ.YEAR2=DIM_EQUIPMENT.YEAR;
Run Code Online (Sandbox Code Playgroud)


map*_*aft 16

与PARTITION BY结合使用时,OVER子句声明必须通过评估返回的查询行来分析地完成前面的函数调用.可以将其视为内联GROUP BY语句.

OVER (PARTITION BY SalesOrderID) 表示对于SUM,AVG等...函数,返回值OVER从查询返回的记录的子集,而PARTITION表示子集BY外键SalesOrderID.

因此,我们将为每个UNIQUE SalesOrderID的每个OrderQty记录进行SUM,并且该列名称将被称为"Total".

与使用多个内联视图查找相同的信息相比,这是一种更有效的方法.您可以将此查询放在内联视图中,然后对Total进行过滤.

SELECT ...,
FROM (your query) inlineview
WHERE Total < 200
Run Code Online (Sandbox Code Playgroud)


Pav*_*ulu 11

简而言之: Over子句可用于选择非聚合值和聚合值。

分区 BY、内部ORDER BY以及ROWS 或 RANGE是 OVER() by 子句的一部分。

Partition by 用于对数据进行分区,然后执行这些窗口、聚合函数,如果我们没有 Partition by 则整个结果集将被视为单个分区。

OVER 子句可与排名函数(Rank、Row_Number、Dense_Rank..)、聚合函数(如(AVG、Max、Min、SUM...等))和分析函数(如(First_Value、Last_Value 等))一起使用。

让我们看看 OVER 子句的基本语法

OVER (   
       [ <PARTITION BY clause> ]  
       [ <ORDER BY clause> ]   
       [ <ROW or RANGE clause> ]  
      )  
Run Code Online (Sandbox Code Playgroud)

PARTITION BY:用于对数据进行分区,对具有相同数据的组进行操作。

ORDER BY:用于定义Partition中数据的逻辑顺序。当我们不指定 Partition 时,整个结果集被视为单个分区

:这可用于指定执行操作时应考虑分区中的哪些行。

让我们举个例子:

这是我的数据集:

Id          Name                                               Gender     Salary
----------- -------------------------------------------------- ---------- -----------
1           Mark                                               Male       5000
2           John                                               Male       4500
3           Pavan                                              Male       5000
4           Pam                                                Female     5500
5           Sara                                               Female     4000
6           Aradhya                                            Female     3500
7           Tom                                                Male       5500
8           Mary                                               Female     5000
9           Ben                                                Male       6500
10          Jodi                                               Female     7000
11          Tom                                                Male       5500
12          Ron                                                Male       5000
Run Code Online (Sandbox Code Playgroud)

因此,让我执行不同的场景,看看数据如何受到影响,然后我将从困难的语法转向简单的语法

Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000
Run Code Online (Sandbox Code Playgroud)

只需观察 sum_sal 部分即可。在这里,我使用按薪水排序并使用“RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”。在这种情况下,我们不使用分区,因此整个数据将被视为一个分区,并且我们按工资订购。这里重要的是UNBOUNDED PRECEDING AND CURRENT ROW。这意味着当我们计算总和时,每行从起始行到当前行。但是,如果我们看到工资为 5000 且名称=“Pavan”的行,理想情况下它应该是 17000,而对于工资=5000 和名称=Mark,它应该是 22000。但是由于我们使用的是RANGE,在这种情况下,如果它找到任何相似的元素,然后它将它们视为相同的逻辑组并对它们执行操作并为该组中的每个项目分配值。这就是为什么我们的工资值相同= 5000。引擎上升到salary=5000和Name=Ron并计算总和,然后将其分配给所有salary=5000。

Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees


   Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        17000
1           Mark                                               Male       5000        22000
8           Mary                                               Female     5000        27000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        37500
7           Tom                                                Male       5500        43000
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000
Run Code Online (Sandbox Code Playgroud)

因此,对于ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ,区别在于相同值的项目而不是将它们分组在一起,它计算从起始行到当前行的 SUM,并且不会像RANGE那样以不同的方式处理具有相同值的项目

Select *,SUM(salary) Over(order by salary) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000
Run Code Online (Sandbox Code Playgroud)

这些结果与

Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Run Code Online (Sandbox Code Playgroud)

这是因为Over(order by salaries)只是Over(order by salaries RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)的一个捷径 ,所以无论我们在哪里简单地指定Order by 而没有ROWS 或 RANGE,它都会将RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW视为默认。

注意:这仅适用于实际接受 RANGE/ROW 的函数。例如,ROW_NUMBER 和其他少数不接受 RANGE/ROW,在这种情况下,这不会出现在图中。

到目前为止,我们看到带有 order by 的 Over 子句采用 Range/ROWS,语法看起来类似于RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 并且它实际上是从第一行计算到当前行。但是,如果它想要计算整个数据分区的值并为每一列(即从第一行到最后一行)计算值,该怎么办?这是对此的查询

Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000
Run Code Online (Sandbox Code Playgroud)

我指定UNBOUNDED FOLLOWING而不是 CURRENT ROW,它指示引擎计算每行的分区的最后一个记录。

现在来谈谈什么是带有空大括号的 OVER() ?

这只是Over(order by salaries ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)的捷径

这里我们间接指定将我的所有结果集视为单个分区,然后从每个分区的第一条记录到最后一条记录进行计算。

Select *,Sum(salary) Over() as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000
Run Code Online (Sandbox Code Playgroud)

我确实为此制作了一个视频,如果您有兴趣可以访问它。 https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s

谢谢,Pavan Kumar Aryasomayajulu HTTP://xyzcoder.github.io


Els*_*han 6

  • 也称为Query Petition子句。
  • 类似于Group By条款

    • 将数据分解为块(或分区)
    • 按分区边界分隔
    • 函数在分区内执行
    • 穿过分离边界时重新初始化

语法:
function (...) OVER (PARTITION BY col1 col3,...)

  • 功能

    • COUNT()SUM()MIN()MAX()等熟悉的函数
    • 还有新功能(例如ROW_NUMBER()RATION_TO_REOIRT()等)


更多信息与示例:http://msdn.microsoft.com/en-us/library/ms189461.aspx