jam*_*uss 15 sql-server t-sql sql-server-2012
如果这对任何人来说都更容易,我为这个问题制作了一个SQL Fiddle。
我有一个各种各样的梦幻体育数据库,我想弄清楚如何得出“当前的连胜”数据(例如,如果球队赢得了最近的两场比赛,则为“W2”,如果他们输了则为“L1”他们赢得上一场比赛后的最后一场比赛 - 如果他们最近的比赛打平,则为“T1”)。
这是我的基本架构:
CREATE TABLE FantasyTeams (
team_id BIGINT NOT NULL
)
CREATE TABLE FantasyMatches(
match_id BIGINT NOT NULL,
home_fantasy_team_id BIGINT NOT NULL,
away_fantasy_team_id BIGINT NOT NULL,
fantasy_season_id BIGINT NOT NULL,
fantasy_league_id BIGINT NOT NULL,
fantasy_week_id BIGINT NOT NULL,
winning_team_id BIGINT NULL
)
Run Code Online (Sandbox Code Playgroud)
的值NULL
在winning_team_id
列指示该匹配领带。
这是一个示例 DML 语句,其中包含 6 支球队和 3 周比赛的一些示例数据:
INSERT INTO FantasyTeams
SELECT 1
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
INSERT INTO FantasyMatches
SELECT 1, 2, 1, 2, 4, 44, 2
UNION
SELECT 2, 5, 4, 2, 4, 44, 5
UNION
SELECT 3, 6, 3, 2, 4, 44, 3
UNION
SELECT 4, 2, 4, 2, 4, 45, 2
UNION
SELECT 5, 3, 1, 2, 4, 45, 3
UNION
SELECT 6, 6, 5, 2, 4, 45, 6
UNION
SELECT 7, 2, 6, 2, 4, 46, 2
UNION
SELECT 8, 3, 5, 2, 4, 46, 3
UNION
SELECT 9, 4, 1, 2, 4, 46, NULL
GO
Run Code Online (Sandbox Code Playgroud)
这是所需输出的示例(基于上面的 DML),我什至无法开始弄清楚如何推导:
| TEAM_ID | STEAK_TYPE | STREAK_COUNT |
|---------|------------|--------------|
| 1 | T | 1 |
| 2 | W | 3 |
| 3 | W | 3 |
| 4 | T | 1 |
| 5 | L | 2 |
| 6 | L | 1 |
Run Code Online (Sandbox Code Playgroud)
我已经尝试过使用子查询和 CTE 的各种方法,但我无法将它们放在一起。我想避免使用游标,因为我将来可能会有一个大型数据集来运行它。我觉得可能有一种方法涉及以某种方式将这些数据连接到自身的表变量,但我仍在研究它。
附加信息:可能有不同数量的球队(6 到 10 之间的任何偶数),并且每周每支球队的总对数将增加 1。关于我应该如何做到这一点的任何想法?
Mik*_*son 17
由于您使用的是 SQL Server 2012,您可以使用几个新的窗口函数。
with C1 as
(
select T.team_id,
case
when M.winning_team_id is null then 'T'
when M.winning_team_id = T.team_id then 'W'
else 'L'
end as streak_type,
M.match_id
from FantasyMatches as M
cross apply (values(M.home_fantasy_team_id),
(M.away_fantasy_team_id)) as T(team_id)
), C2 as
(
select C1.team_id,
C1.streak_type,
C1.match_id,
lag(C1.streak_type, 1, C1.streak_type)
over(partition by C1.team_id
order by C1.match_id desc) as lag_streak_type
from C1
), C3 as
(
select C2.team_id,
C2.streak_type,
sum(case when C2.lag_streak_type = C2.streak_type then 0 else 1 end)
over(partition by C2.team_id
order by C2.match_id desc rows unbounded preceding) as streak_sum
from C2
)
select C3.team_id,
C3.streak_type,
count(*) as streak_count
from C3
where C3.streak_sum = 0
group by C3.team_id,
C3.streak_type
order by C3.team_id;
Run Code Online (Sandbox Code Playgroud)
C1
计算streak_type
每支球队和比赛的。
C2
查找前一个streak_type
由 排序match_id desc
。
C3
生成streak_sum
通过match_id desc
保持 a 0
long排序的运行总和,streak_type
因为它与最后一个值相同。
主查询总结了条纹 where streak_sum
is 0
。
Pau*_*ite 10
解决此问题的一种直观方法是:
假设递归策略得到有效实施,随着表变大,此策略可能会胜过窗口函数解决方案(它执行数据的完整扫描)。成功的关键是提供有效的索引来快速定位行(使用查找)并避免排序。需要的索引是:
-- New index #1
CREATE UNIQUE INDEX uq1 ON dbo.FantasyMatches
(home_fantasy_team_id, match_id)
INCLUDE (winning_team_id);
-- New index #2
CREATE UNIQUE INDEX uq2 ON dbo.FantasyMatches
(away_fantasy_team_id, match_id)
INCLUDE (winning_team_id);
Run Code Online (Sandbox Code Playgroud)
为了协助查询优化,我将使用一个临时表来保存标识为构成当前连续数据一部分的行。如果连续上垒通常很短(遗憾的是,我关注的球队也是如此),这张表应该很小:
-- Table to hold just the rows that form streaks
CREATE TABLE #StreakData
(
team_id bigint NOT NULL,
match_id bigint NOT NULL,
streak_type char(1) NOT NULL,
streak_length integer NOT NULL,
);
-- Temporary table unique clustered index
CREATE UNIQUE CLUSTERED INDEX cuq ON #StreakData (team_id, match_id);
Run Code Online (Sandbox Code Playgroud)
我的递归查询解决方案如下(此处为SQL Fiddle):
-- Solution query
WITH Streaks AS
(
-- Anchor: most recent match for each team
SELECT
FT.team_id,
CA.match_id,
CA.streak_type,
streak_length = 1
FROM dbo.FantasyTeams AS FT
CROSS APPLY
(
-- Most recent match
SELECT
T.match_id,
T.streak_type
FROM
(
SELECT
FM.match_id,
streak_type =
CASE
WHEN FM.winning_team_id = FM.home_fantasy_team_id
THEN CONVERT(char(1), 'W')
WHEN FM.winning_team_id IS NULL
THEN CONVERT(char(1), 'T')
ELSE CONVERT(char(1), 'L')
END
FROM dbo.FantasyMatches AS FM
WHERE
FT.team_id = FM.home_fantasy_team_id
UNION ALL
SELECT
FM.match_id,
streak_type =
CASE
WHEN FM.winning_team_id = FM.away_fantasy_team_id
THEN CONVERT(char(1), 'W')
WHEN FM.winning_team_id IS NULL
THEN CONVERT(char(1), 'T')
ELSE CONVERT(char(1), 'L')
END
FROM dbo.FantasyMatches AS FM
WHERE
FT.team_id = FM.away_fantasy_team_id
) AS T
ORDER BY
T.match_id DESC
OFFSET 0 ROWS
FETCH FIRST 1 ROW ONLY
) AS CA
UNION ALL
-- Recursive part: prior match with the same streak type
SELECT
Streaks.team_id,
LastMatch.match_id,
Streaks.streak_type,
Streaks.streak_length + 1
FROM Streaks
CROSS APPLY
(
-- Most recent prior match
SELECT
Numbered.match_id,
Numbered.winning_team_id,
Numbered.team_id
FROM
(
-- Assign a row number
SELECT
PreviousMatches.match_id,
PreviousMatches.winning_team_id,
PreviousMatches.team_id,
rn = ROW_NUMBER() OVER (
ORDER BY PreviousMatches.match_id DESC)
FROM
(
-- Prior match as home or away team
SELECT
FM.match_id,
FM.winning_team_id,
team_id = FM.home_fantasy_team_id
FROM dbo.FantasyMatches AS FM
WHERE
FM.home_fantasy_team_id = Streaks.team_id
AND FM.match_id < Streaks.match_id
UNION ALL
SELECT
FM.match_id,
FM.winning_team_id,
team_id = FM.away_fantasy_team_id
FROM dbo.FantasyMatches AS FM
WHERE
FM.away_fantasy_team_id = Streaks.team_id
AND FM.match_id < Streaks.match_id
) AS PreviousMatches
) AS Numbered
-- Most recent
WHERE
Numbered.rn = 1
) AS LastMatch
-- Check the streak type matches
WHERE EXISTS
(
SELECT
Streaks.streak_type
INTERSECT
SELECT
CASE
WHEN LastMatch.winning_team_id IS NULL THEN 'T'
WHEN LastMatch.winning_team_id = LastMatch.team_id THEN 'W'
ELSE 'L'
END
)
)
INSERT #StreakData
(team_id, match_id, streak_type, streak_length)
SELECT
team_id,
match_id,
streak_type,
streak_length
FROM Streaks
OPTION (MAXRECURSION 0);
Run Code Online (Sandbox Code Playgroud)
T-SQL 文本很长,但查询的每个部分都与本答案开头给出的大致流程大纲密切对应。由于需要使用某些技巧来避免排序并TOP
在查询的递归部分生成 a (通常是不允许的),因此查询变得更长。
与查询相比,执行计划相对较小且简单。在下面的屏幕截图中,我将锚区域涂成黄色,递归部分涂成绿色:
使用临时表中捕获的连续行,很容易获得您需要的汇总结果。(使用临时表还可以避免在下面的查询与主递归查询组合时可能发生的排序溢出)
-- Basic results
SELECT
SD.team_id,
StreakType = MAX(SD.streak_type),
StreakLength = MAX(SD.streak_length)
FROM #StreakData AS SD
GROUP BY
SD.team_id
ORDER BY
SD.team_id;
Run Code Online (Sandbox Code Playgroud)
可以使用相同的查询作为更新FantasyTeams
表的基础:
-- Update team summary
WITH StreakData AS
(
SELECT
SD.team_id,
StreakType = MAX(SD.streak_type),
StreakLength = MAX(SD.streak_length)
FROM #StreakData AS SD
GROUP BY
SD.team_id
)
UPDATE FT
SET streak_type = SD.StreakType,
streak_count = SD.StreakLength
FROM StreakData AS SD
JOIN dbo.FantasyTeams AS FT
ON FT.team_id = SD.team_id;
Run Code Online (Sandbox Code Playgroud)
或者,如果您更喜欢MERGE
:
MERGE dbo.FantasyTeams AS FT
USING
(
SELECT
SD.team_id,
StreakType = MAX(SD.streak_type),
StreakLength = MAX(SD.streak_length)
FROM #StreakData AS SD
GROUP BY
SD.team_id
) AS StreakData
ON StreakData.team_id = FT.team_id
WHEN MATCHED THEN UPDATE SET
FT.streak_type = StreakData.StreakType,
FT.streak_count = StreakData.StreakLength;
Run Code Online (Sandbox Code Playgroud)
这两种方法都会产生一个高效的执行计划(基于临时表中的已知行数):
最后,由于递归方法match_id
在其处理中自然包含,因此很容易将match_id
形成每个条纹的s列表添加到输出中:
SELECT
S.team_id,
streak_type = MAX(S.streak_type),
match_id_list =
STUFF(
(
SELECT ',' + CONVERT(varchar(11), S2.match_id)
FROM #StreakData AS S2
WHERE S2.team_id = S.team_id
ORDER BY S2.match_id DESC
FOR XML PATH ('')
), 1, 1, ''),
streak_length = MAX(S.streak_length)
FROM #StreakData AS S
GROUP BY
S.team_id
ORDER BY
S.team_id;
Run Code Online (Sandbox Code Playgroud)
输出:
执行计划:
获得结果的另一种方法是通过递归 CTE
WITH TeamRes As (
SELECT FT.Team_ID
, FM.match_id
, Previous_Match = LAG(match_id, 1, 0)
OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id)
, Matches = Row_Number()
OVER (PARTITION BY FT.Team_ID ORDER BY FM.match_id Desc)
, Result = Case Coalesce(winning_team_id, -1)
When -1 Then 'T'
When FT.Team_ID Then 'W'
Else 'L'
End
FROM FantasyMatches FM
INNER JOIN FantasyTeams FT ON FT.Team_ID IN
(FM.home_fantasy_team_id, FM.away_fantasy_team_id)
), Streaks AS (
SELECT Team_ID, Result, 1 As Streak, Previous_Match
FROM TeamRes
WHERE Matches = 1
UNION ALL
SELECT tr.Team_ID, tr.Result, Streak + 1, tr.Previous_Match
FROM TeamRes tr
INNER JOIN Streaks s ON tr.Team_ID = s.Team_ID
AND tr.Match_id = s.Previous_Match
AND tr.Result = s.Result
)
Select Team_ID, Result, Max(Streak) Streak
From Streaks
Group By Team_ID, Result
Order By Team_ID
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2703 次 |
最近记录: |