使用排名函数查找重复出现的事件

Tom*_*żny 6 sql t-sql sql-server sql-server-2008 ranking-functions

请帮助我生成以下查询,我已经挣扎了一段时间了.让我们说我有一个简单的表格,其中包含月份编号和信息,表明在这个特定月份是否有任何失败事件

在脚本下面生成示例数据:

WITH DATA(Month, Success) AS
(
    SELECT  1, 0 UNION ALL
    SELECT  2, 0 UNION ALL
    SELECT  3, 0 UNION ALL
    SELECT  4, 1 UNION ALL
    SELECT  5, 1 UNION ALL
    SELECT  6, 0 UNION ALL
    SELECT  7, 0 UNION ALL
    SELECT  8, 1 UNION ALL
    SELECT  9, 0 UNION ALL
    SELECT 10, 1 UNION ALL
    SELECT 11, 0 UNION ALL
    SELECT 12, 1 UNION ALL
    SELECT 13, 0 UNION ALL
    SELECT 14, 1 UNION ALL
    SELECT 15, 0 UNION ALL
    SELECT 16, 1 UNION ALL
    SELECT 17, 0 UNION ALL
    SELECT 18, 0
)
Run Code Online (Sandbox Code Playgroud)

鉴于"重复失败"的定义:

当任何6个月内至少4个月内发生事件失败时,那么失败的最后一个月是"重复失败"我的查询应该返回以下输出

Month   Success RepeatedFailure
1       0   
2       0   
3       0   
4       1   
5       1   
6       0       R1
7       0       R2
8       1   
9       0   
10      1   
11      0       R3
12      1   
13      0   
14      1   
15      0   
16      1   
17      0
18      0       R1
Run Code Online (Sandbox Code Playgroud)

哪里:

  • R1-1在第6个月重复失败(过去6个月失败4次).
  • R2第二次重复失败,第7个月失败(过去6个月失败4次).
  • R3第3次在第11个月重复失败(过去6个月失败4次).

R1 - 在第18个月中第一次重复失败,因为在最近6个报告期内第一次出现新的重复失败时,重复失败应该从头开始再次编号

重复失败是连续计算的,因为根据其编号,我必须应用适当的乘数:

  • 第一次重复失败 - X2
  • 第二次重复失败 - X4
  • 第3次和更多次重复失败-X5.

Aak*_*shM 2

我确信这可以改进,但它确实有效。我们基本上做了两遍——第一遍是确定重复的失败,第二遍是确定每次重复失败的类型。请注意,这Intermediate2绝对可以取消,为了清楚起见,我只是将其分开。所有代码都是一个语句,我的解释是交错的:

;WITH DATA(Month, Success) AS
-- assuming your data  as defined (with my edit)
,Intermediate AS 
(
SELECT
    Month,
    Success,
    -- next column for illustration only
    (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS SuccessesInLastSixMonths,
    -- next column for illustration only
    6 - (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS FailuresInLastSixMonths,
    CASE WHEN 
            (6 - (SELECT SUM(Success) 
                    FROM DATA hist 
                    WHERE curr.Month - hist.Month BETWEEN 0 AND 5)) 
            >= 4 
            THEN 1
            ELSE 0 
    END AS IsRepeatedFailure
FROM DATA curr 
-- No real data until month 6
WHERE curr.Month > 5
)
Run Code Online (Sandbox Code Playgroud)

此时,我们通过计算截至该月份(包括该月份在内)的六个月内的故障次数,确定每个月是否重复出现故障。

,Intermediate2 AS
(
SELECT 
    Month,
    Success,
    IsRepeatedFailure,
    (SELECT SUM(IsRepeatedFailure) 
        FROM Intermediate hist 
        WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS RepeatedFailuresInLastSixMonths
FROM Intermediate curr
)
Run Code Online (Sandbox Code Playgroud)

现在我们统计了到目前为止的六个月内重复失败的次数

SELECT
    Month,
    Success,
    CASE IsRepeatedFailure 
        WHEN 1 THEN 'R' + CONVERT(varchar, RepeatedFailuresInLastSixMonths) 
        ELSE '' END
    AS RepeatedFailureText
FROM Intermediate2
Run Code Online (Sandbox Code Playgroud)

所以我们可以说,如果这个月是一次重复的失败,那么重复失败的基数是多少。

结果:

Month       Success     RepeatedFailureText
----------- ----------- -------------------------------
6           0           R1
7           0           R2
8           1           
9           0           
10          1           
11          0           R3
12          1           
13          0           
14          1           
15          0           
16          1           
17          0           
18          0           R1

(13 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

性能考虑因素将取决于您实际拥有的数据量。