iam*_*ave 17 sql t-sql sql-server join date
给定以下数据集与日期表配对:
MembershipId | ValidFromDate | ValidToDate
==========================================
0001 | 1997-01-01 | 2006-05-09
0002 | 1997-01-01 | 2017-05-12
0003 | 2005-06-02 | 2009-02-07
Run Code Online (Sandbox Code Playgroud)
Memberships在任何一天或几天的时间里有多少人开放?
下面这个问题被问在这里,这个答案提供了必要的功能:
select d.[Date]
,count(m.MembershipID) as MembershipCount
from DIM.[Date] as d
left join Memberships as m
on(d.[Date] between m.ValidFromDateKey and m.ValidToDateKey)
where d.CalendarYear = 2016
group by d.[Date]
order by d.[Date];
Run Code Online (Sandbox Code Playgroud)
虽然一位评论者评论说,当非等值的时间太长时,还有其他方法.
因此,equijoin只有逻辑看起来像复制上面查询的输出?
从目前为止提供的答案中我得出了以下内容,它在我使用的320万Membership条记录的硬件上表现优异:
declare @s date = '20160101';
declare @e date = getdate();
with s as
(
select d.[Date] as d
,count(s.MembershipID) as s
from dbo.Dates as d
join dbo.Memberships as s
on d.[Date] = s.ValidFromDateKey
group by d.[Date]
)
,e as
(
select d.[Date] as d
,count(e.MembershipID) as e
from dbo.Dates as d
join dbo.Memberships as e
on d.[Date] = e.ValidToDateKey
group by d.[Date]
),c as
(
select isnull(s.d,e.d) as d
,sum(isnull(s.s,0) - isnull(e.e,0)) over (order by isnull(s.d,e.d)) as c
from s
full join e
on s.d = e.d
)
select d.[Date]
,c.c
from dbo.Dates as d
left join c
on d.[Date] = c.d
where d.[Date] between @s and @e
order by d.[Date]
;
Run Code Online (Sandbox Code Playgroud)
接下来,为了将这个聚合体分成每天的成分组,我有以下几点,这也表现良好:
declare @s date = '20160101';
declare @e date = getdate();
with s as
(
select d.[Date] as d
,s.MembershipGrouping as g
,count(s.MembershipID) as s
from dbo.Dates as d
join dbo.Memberships as s
on d.[Date] = s.ValidFromDateKey
group by d.[Date]
,s.MembershipGrouping
)
,e as
(
select d.[Date] as d
,e..MembershipGrouping as g
,count(e.MembershipID) as e
from dbo.Dates as d
join dbo.Memberships as e
on d.[Date] = e.ValidToDateKey
group by d.[Date]
,e.MembershipGrouping
),c as
(
select isnull(s.d,e.d) as d
,isnull(s.g,e.g) as g
,sum(isnull(s.s,0) - isnull(e.e,0)) over (partition by isnull(s.g,e.g) order by isnull(s.d,e.d)) as c
from s
full join e
on s.d = e.d
and s.g = e.g
)
select d.[Date]
,c.g
,c.c
from dbo.Dates as d
left join c
on d.[Date] = c.d
where d.[Date] between @s and @e
order by d.[Date]
,c.g
;
Run Code Online (Sandbox Code Playgroud)
任何人都可以改进上述内容吗?
Vla*_*nov 13
如果您的大部分会员资格有效期都超过几天,请查看Martin Smith的答案.这种方法可能会更快.
当您使用日历表(DIM.[Date])并将其与其连接时Memberships,您可能最终会扫描该Memberships表以获取该范围的每个日期.即使有索引(ValidFromDate, ValidToDate),它也可能不是非常有用.
很容易扭转它.Memberships仅扫描一次表,并为每个成员查找有效使用的日期CROSS APPLY.
样本数据
DECLARE @T TABLE (MembershipId int, ValidFromDate date, ValidToDate date);
INSERT INTO @T VALUES
(1, '1997-01-01', '2006-05-09'),
(2, '1997-01-01', '2017-05-12'),
(3, '2005-06-02', '2009-02-07');
DECLARE @RangeFrom date = '2006-01-01';
DECLARE @RangeTo date = '2006-12-31';
Run Code Online (Sandbox Code Playgroud)
查询1
SELECT
CA.dt
,COUNT(*) AS MembershipCount
FROM
@T AS Memberships
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >= Memberships.ValidFromDate
AND dbo.Calendar.dt <= Memberships.ValidToDate
AND dbo.Calendar.dt >= @RangeFrom
AND dbo.Calendar.dt <= @RangeTo
) AS CA
GROUP BY
CA.dt
ORDER BY
CA.dt
OPTION(RECOMPILE);
Run Code Online (Sandbox Code Playgroud)
OPTION(RECOMPILE) 并不是真的需要,当我比较执行计划时,我将它包含在所有查询中,以确保我在使用查询时获得最新计划.
当我看着这个查询的计划,我看到的是,寻求在Calendar.dt表中只使用了ValidFromDate和ValidToDate中,@RangeFrom并@RangeTo被推到残留谓词.这不是理想的.优化器不够聪明,无法计算两个日期(ValidFromDate和@RangeFrom)的最大值,并将该日期用作搜索的起点.
很容易帮助优化者:
查询2
SELECT
CA.dt
,COUNT(*) AS MembershipCount
FROM
@T AS Memberships
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >=
CASE WHEN Memberships.ValidFromDate > @RangeFrom
THEN Memberships.ValidFromDate
ELSE @RangeFrom END
AND dbo.Calendar.dt <=
CASE WHEN Memberships.ValidToDate < @RangeTo
THEN Memberships.ValidToDate
ELSE @RangeTo END
) AS CA
GROUP BY
CA.dt
ORDER BY
CA.dt
OPTION(RECOMPILE)
;
Run Code Online (Sandbox Code Playgroud)
在该查询中,搜索是最佳的,并且不读取可能稍后丢弃的日期.
最后,您可能不需要扫描整个Memberships表格.我们只需要那些给定的日期范围与成员资格的有效范围相交的行.
查询3
SELECT
CA.dt
,COUNT(*) AS MembershipCount
FROM
@T AS Memberships
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >=
CASE WHEN Memberships.ValidFromDate > @RangeFrom
THEN Memberships.ValidFromDate
ELSE @RangeFrom END
AND dbo.Calendar.dt <=
CASE WHEN Memberships.ValidToDate < @RangeTo
THEN Memberships.ValidToDate
ELSE @RangeTo END
) AS CA
WHERE
Memberships.ValidToDate >= @RangeFrom
AND Memberships.ValidFromDate <= @RangeTo
GROUP BY
CA.dt
ORDER BY
CA.dt
OPTION(RECOMPILE)
;
Run Code Online (Sandbox Code Playgroud)
两个间隔[a1;a2]和[b1;b2]相交时
a2 >= b1 and a1 <= b2
Run Code Online (Sandbox Code Playgroud)
这些查询假定该Calendar表具有索引dt.
您应该尝试查看哪些索引更适合该Memberships表.对于最后一个查询,如果表格相当大,则很可能两个单独的索引on ValidFromDate和on ValidToDate将比一个索引更好(ValidFromDate, ValidToDate).
您应该尝试不同的查询,并使用真实数据在真实硬件上测量它们的性能.性能可能取决于数据分布,有多少成员资格,有效日期,给定范围的宽度或宽度等.
我建议使用一个名为SQL Sentry Plan Explorer的优秀工具来分析和比较执行计划.这是免费的.它显示了许多有用的统计信息,例如每个查询的执行时间和读取次数.上面的屏幕截图来自此工具.
假设您的日期维度包含所有成员资格期间中包含的所有日期,您可以使用以下内容.
连接是一个equi连接,因此可以使用散列连接或合并连接,而不仅仅是嵌套循环(它将为每个外部行执行一次内部子树).
假设索引开启(ValidToDate) include(ValidFromDate)或反转,则可以使用单个搜索Memberships和单个日期维度扫描.下面有一个不到一秒的时间让我将一年的结果与一个拥有320万会员和一般活跃会员资格140万的表格一起返回(脚本)
DECLARE @StartDate DATE = '2016-01-01',
@EndDate DATE = '2016-12-31';
WITH MD
AS (SELECT Date,
SUM(Adj) AS MemberDelta
FROM Memberships
CROSS APPLY (VALUES ( ValidFromDate, +1),
--Membership count decremented day after the ValidToDate
(DATEADD(DAY, 1, ValidToDate), -1) ) V(Date, Adj)
WHERE
--Members already expired before the time range of interest can be ignored
ValidToDate >= @StartDate
AND
--Members whose membership starts after the time range of interest can be ignored
ValidFromDate <= @EndDate
GROUP BY Date),
MC
AS (SELECT DD.DateKey,
SUM(MemberDelta) OVER (ORDER BY DD.DateKey ROWS UNBOUNDED PRECEDING) AS CountOfNonIgnoredMembers
FROM DIM_DATE DD
LEFT JOIN MD
ON MD.Date = DD.DateKey)
SELECT DateKey,
CountOfNonIgnoredMembers AS MembershipCount
FROM MC
WHERE DateKey BETWEEN @StartDate AND @EndDate
ORDER BY DateKey
Run Code Online (Sandbox Code Playgroud)
演示(使用延长期作为2016年的日历年对示例数据不是很有趣)