从胜负平局数据中获取连胜次数和连胜类型

jam*_*uss 15 sql-server t-sql sql-server-2012

如果这对任何人来说都更容易,我为这个问题制作了一个SQL Fiddle

我有一个各种各样的梦幻体育数据库,我想弄清楚如何得出“当前的连胜”数据(例如,如果球队赢得了最近的两场比赛,则为“W2”,如果他们输了则为“L1”他们赢得上一场比赛后的最后一场比赛 - 如果他们最近的比赛打平,则为“T1”)。

这是我的基本架构:

CREATE TABLE FantasyTeams (
  team_id BIGINT NOT NULL
)

CREATE TABLE FantasyMatches(
    match_id BIGINT NOT NULL,
    home_fantasy_team_id BIGINT NOT NULL,
    away_fantasy_team_id BIGINT NOT NULL,
    fantasy_season_id BIGINT NOT NULL,
    fantasy_league_id BIGINT NOT NULL,
    fantasy_week_id BIGINT NOT NULL,
    winning_team_id BIGINT NULL
)
Run Code Online (Sandbox Code Playgroud)

的值NULLwinning_team_id列指示该匹配领带。

这是一个示例 DML 语句,其中包含 6 支球队和 3 周比赛的一些示例数据:

INSERT INTO FantasyTeams
SELECT 1
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6

INSERT INTO FantasyMatches
SELECT 1, 2, 1, 2, 4, 44, 2
UNION
SELECT 2, 5, 4, 2, 4, 44, 5
UNION
SELECT 3, 6, 3, 2, 4, 44, 3
UNION
SELECT 4, 2, 4, 2, 4, 45, 2
UNION
SELECT 5, 3, 1, 2, 4, 45, 3
UNION
SELECT 6, 6, 5, 2, 4, 45, 6
UNION
SELECT 7, 2, 6, 2, 4, 46, 2
UNION
SELECT 8, 3, 5, 2, 4, 46, 3
UNION
SELECT 9, 4, 1, 2, 4, 46, NULL

GO
Run Code Online (Sandbox Code Playgroud)

这是所需输出的示例(基于上面的 DML),我什至无法开始弄清楚如何推导:

| TEAM_ID | STEAK_TYPE | STREAK_COUNT |
|---------|------------|--------------|
|       1 |          T |            1 |
|       2 |          W |            3 |
|       3 |          W |            3 |
|       4 |          T |            1 |
|       5 |          L |            2 |
|       6 |          L |            1 |
Run Code Online (Sandbox Code Playgroud)

我已经尝试过使用子查询和 CTE 的各种方法,但我无法将它们放在一起。我想避免使用游标,因为我将来可能会有一个大型数据集来运行它。我觉得可能有一种方法涉及以某种方式将这些数据连接到自身的表变量,但我仍在研究它。

附加信息:可能有不同数量的球队(6 到 10 之间的任何偶数),并且每周每支球队的总对数将增加 1。关于我应该如何做到这一点的任何想法?

Mik*_*son 17

由于您使用的是 SQL Server 2012,您可以使用几个新的窗口函数。

with C1 as
(
  select T.team_id,
         case
           when M.winning_team_id is null then 'T'
           when M.winning_team_id = T.team_id then 'W'
           else 'L'
         end as streak_type,
         M.match_id
  from FantasyMatches as M
    cross apply (values(M.home_fantasy_team_id),
                       (M.away_fantasy_team_id)) as T(team_id)
), C2 as
(
  select C1.team_id,
         C1.streak_type,
         C1.match_id,
         lag(C1.streak_type, 1, C1.streak_type) 
           over(partition by C1.team_id 
                order by C1.match_id desc) as lag_streak_type
  from C1
), C3 as
(
  select C2.team_id,
         C2.streak_type,
         sum(case when C2.lag_streak_type = C2.streak_type then 0 else 1 end) 
           over(partition by C2.team_id 
                order by C2.match_id desc rows unbounded preceding) as streak_sum
  from C2
)
select C3.team_id,
       C3.streak_type,
       count(*) as streak_count
from C3
where C3.streak_sum = 0
group by C3.team_id,
         C3.streak_type
order by C3.team_id;
Run Code Online (Sandbox Code Playgroud)

SQL小提琴

C1计算streak_type每支球队和比赛的。

C2查找前一个streak_type由 排序match_id desc

C3生成streak_sum通过match_id desc保持 a 0long排序的运行总和,streak_type因为它与最后一个值相同。

主查询总结了条纹 where streak_sumis 0


Pau*_*ite 10

解决此问题的一种直观方法是:

  1. 查找每个团队的最新结果
  2. 如果结果类型匹配,则检查上一个匹配项并在连续计数中添加一个
  3. 重复第 2 步,但在遇到第一个不同的结果时立即停止

假设递归策略得到有效实施,随着表变大,此策略可能会胜过窗口函数解决方案(它执行数据的完整扫描)。成功的关键是提供有效的索引来快速定位行(使用查找)并避免排序。需要的索引是:

-- New index #1
CREATE UNIQUE INDEX uq1 ON dbo.FantasyMatches 
    (home_fantasy_team_id, match_id) 
INCLUDE (winning_team_id);

-- New index #2
CREATE UNIQUE INDEX uq2 ON dbo.FantasyMatches 
    (away_fantasy_team_id, match_id) 
INCLUDE (winning_team_id);
Run Code Online (Sandbox Code Playgroud)

为了协助查询优化,我将使用一个临时表来保存标识为构成当前连续数据一部分的行。如果连续上垒通常很短(遗憾的是,我关注的球队也是如此),这张表应该很小:

-- Table to hold just the rows that form streaks
CREATE TABLE #StreakData
(
    team_id bigint NOT NULL,
    match_id bigint NOT NULL,
    streak_type char(1) NOT NULL,
    streak_length integer NOT NULL,
);

-- Temporary table unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq ON #StreakData (team_id, match_id);
Run Code Online (Sandbox Code Playgroud)

我的递归查询解决方案如下(此处SQL Fiddle):

-- Solution query
WITH Streaks AS
(
    -- Anchor: most recent match for each team
    SELECT 
        FT.team_id, 
        CA.match_id, 
        CA.streak_type, 
        streak_length = 1
    FROM dbo.FantasyTeams AS FT
    CROSS APPLY
    (
        -- Most recent match
        SELECT
            T.match_id,
            T.streak_type
        FROM 
        (
            SELECT 
                FM.match_id, 
                streak_type =
                    CASE 
                        WHEN FM.winning_team_id = FM.home_fantasy_team_id
                            THEN CONVERT(char(1), 'W')
                        WHEN FM.winning_team_id IS NULL
                            THEN CONVERT(char(1), 'T')
                        ELSE CONVERT(char(1), 'L')
                    END
            FROM dbo.FantasyMatches AS FM
            WHERE 
                FT.team_id = FM.home_fantasy_team_id
            UNION ALL
            SELECT 
                FM.match_id, 
                streak_type =
                    CASE 
                        WHEN FM.winning_team_id = FM.away_fantasy_team_id
                            THEN CONVERT(char(1), 'W')
                        WHEN FM.winning_team_id IS NULL
                            THEN CONVERT(char(1), 'T')
                        ELSE CONVERT(char(1), 'L')
                    END
            FROM dbo.FantasyMatches AS FM
            WHERE
                FT.team_id = FM.away_fantasy_team_id
        ) AS T
        ORDER BY 
            T.match_id DESC
            OFFSET 0 ROWS 
            FETCH FIRST 1 ROW ONLY
    ) AS CA
    UNION ALL
    -- Recursive part: prior match with the same streak type
    SELECT 
        Streaks.team_id, 
        LastMatch.match_id, 
        Streaks.streak_type, 
        Streaks.streak_length + 1
    FROM Streaks
    CROSS APPLY
    (
        -- Most recent prior match
        SELECT 
            Numbered.match_id, 
            Numbered.winning_team_id, 
            Numbered.team_id
        FROM
        (
            -- Assign a row number
            SELECT
                PreviousMatches.match_id,
                PreviousMatches.winning_team_id,
                PreviousMatches.team_id, 
                rn = ROW_NUMBER() OVER (
                    ORDER BY PreviousMatches.match_id DESC)
            FROM
            (
                -- Prior match as home or away team
                SELECT 
                    FM.match_id, 
                    FM.winning_team_id, 
                    team_id = FM.home_fantasy_team_id
                FROM dbo.FantasyMatches AS FM
                WHERE 
                    FM.home_fantasy_team_id = Streaks.team_id
                    AND FM.match_id < Streaks.match_id
                UNION ALL
                SELECT 
                    FM.match_id, 
                    FM.winning_team_id, 
                    team_id = FM.away_fantasy_team_id
                FROM dbo.FantasyMatches AS FM
                WHERE 
                    FM.away_fantasy_team_id = Streaks.team_id
                    AND FM.match_id < Streaks.match_id
            ) AS PreviousMatches
        ) AS Numbered
        -- Most recent
        WHERE 
            Numbered.rn = 1
    ) AS LastMatch
    -- Check the streak type matches
    WHERE EXISTS
    (
        SELECT 
            Streaks.streak_type
        INTERSECT
        SELECT 
            CASE 
                WHEN LastMatch.winning_team_id IS NULL THEN 'T' 
                WHEN LastMatch.winning_team_id = LastMatch.team_id THEN 'W' 
                ELSE 'L' 
            END
    )
)
INSERT #StreakData
    (team_id, match_id, streak_type, streak_length)
SELECT
    team_id,
    match_id,
    streak_type,
    streak_length
FROM Streaks
OPTION (MAXRECURSION 0);
Run Code Online (Sandbox Code Playgroud)

T-SQL 文本很长,但查询的每个部分都与本答案开头给出的大致流程大纲密切对应。由于需要使用某些技巧来避免排序并TOP在查询的递归部分生成 a (通常是不允许的),因此查询变得更长。

与查询相比,执行计划相对较小且简单。在下面的屏幕截图中,我将锚区域涂成黄色,递归部分涂成绿色:

递归执行计划

使用临时表中捕获的连续行,很容易获得您需要的汇总结果。(使用临时表还可以避免在下面的查询与主递归查询组合时可能发生的排序溢出)

-- Basic results
SELECT
    SD.team_id,
    StreakType = MAX(SD.streak_type),
    StreakLength = MAX(SD.streak_length)
FROM #StreakData AS SD
GROUP BY 
    SD.team_id
ORDER BY
    SD.team_id;
Run Code Online (Sandbox Code Playgroud)

基本查询执行计划

可以使用相同的查询作为更新FantasyTeams表的基础:

-- Update team summary
WITH StreakData AS
(
    SELECT
        SD.team_id,
        StreakType = MAX(SD.streak_type),
        StreakLength = MAX(SD.streak_length)
    FROM #StreakData AS SD
    GROUP BY 
        SD.team_id
)
UPDATE FT
SET streak_type = SD.StreakType,
    streak_count = SD.StreakLength
FROM StreakData AS SD
JOIN dbo.FantasyTeams AS FT
    ON FT.team_id = SD.team_id;
Run Code Online (Sandbox Code Playgroud)

或者,如果您更喜欢MERGE

MERGE dbo.FantasyTeams AS FT
USING
(
    SELECT
        SD.team_id,
        StreakType = MAX(SD.streak_type),
        StreakLength = MAX(SD.streak_length)
    FROM #StreakData AS SD
    GROUP BY 
        SD.team_id
) AS StreakData
    ON StreakData.team_id = FT.team_id
WHEN MATCHED THEN UPDATE SET
    FT.streak_type = StreakData.StreakType,
    FT.streak_count = StreakData.StreakLength;
Run Code Online (Sandbox Code Playgroud)

这两种方法都会产生一个高效的执行计划(基于临时表中的已知行数):

更新执行计划

最后,由于递归方法match_id在其处理中自然包含,因此很容易将match_id形成每个条纹的s列表添加到输出中:

SELECT
    S.team_id,
    streak_type = MAX(S.streak_type),
    match_id_list =
        STUFF(
        (
            SELECT ',' + CONVERT(varchar(11), S2.match_id)
            FROM #StreakData AS S2
            WHERE S2.team_id = S.team_id
            ORDER BY S2.match_id DESC
            FOR XML PATH ('')
        ), 1, 1, ''),
    streak_length = MAX(S.streak_length)
FROM #StreakData AS S
GROUP BY 
    S.team_id
ORDER BY
    S.team_id;
Run Code Online (Sandbox Code Playgroud)

输出:

包括匹配列表

执行计划:

匹配列表执行计划

  • 感人的!您的递归部分的 WHERE 使用“EXISTS (... INTERSECT ...)”而不是“Streaks.streak_type = CASE ...”是否有特殊原因?我知道当您需要匹配两边的 NULL 以及值时,前一种方法可能很有用,但在这种情况下,正确的部分不会产生任何 NULL,所以...... (2认同)
  • @AndriyM 是的。代码在许多地方和方式都非常仔细地编写,以生成一个没有排序的计划。当使用 `CASE` 时,优化器无法使用合并串联(保留联合键顺序),而是使用串联加排序。 (2认同)

Ser*_*ton 8

获得结果的另一种方法是通过递归 CTE

WITH TeamRes As (
SELECT FT.Team_ID
     , FM.match_id
     , Previous_Match = LAG(match_id, 1, 0) 
                        OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id)
     , Matches = Row_Number() 
                 OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id Desc)
     , Result = Case Coalesce(winning_team_id, -1)
                     When -1 Then 'T'
                     When FT.Team_ID Then 'W'
                     Else 'L'
                End 
FROM   FantasyMatches FM
       INNER JOIN FantasyTeams FT ON FT.Team_ID IN 
         (FM.home_fantasy_team_id, FM.away_fantasy_team_id)
), Streaks AS (
SELECT Team_ID, Result, 1 As Streak, Previous_Match
FROM   TeamRes
WHERE  Matches = 1
UNION ALL
SELECT tr.Team_ID, tr.Result, Streak + 1, tr.Previous_Match
FROM   TeamRes tr
       INNER JOIN Streaks s ON tr.Team_ID = s.Team_ID 
                           AND tr.Match_id = s.Previous_Match 
                           AND tr.Result = s.Result
)
Select Team_ID, Result, Max(Streak) Streak
From   Streaks
Group By Team_ID, Result
Order By Team_ID
Run Code Online (Sandbox Code Playgroud)

SQLFiddle演示