dat*_*god 12 sql-server-2005 sql-server
我有一份报告,显示过去 12 小时的事件计数,按小时分组。听起来很容易,但我正在努力解决的是如何包含弥补差距的记录。
这是一个示例表:
Event
(
EventTime datetime,
EventType int
)
Run Code Online (Sandbox Code Playgroud)
数据如下所示:
'2012-03-08 08:00:04', 1
'2012-03-08 09:10:00', 2
'2012-03-08 09:11:04', 2
'2012-03-08 09:10:09', 1
'2012-03-08 10:00:17', 4
'2012-03-08 11:00:04', 1
Run Code Online (Sandbox Code Playgroud)
我需要创建一个结果集,该结果集在过去 12 小时内的每一小时都有一个记录,无论该小时内是否有事件。
假设当前时间是“2012-03-08 11:00:00”,报告将显示(大致):
Hour EventCount
---- ----------
23 0
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 1
9 3
10 1
Run Code Online (Sandbox Code Playgroud)
我想出了一个解决方案,该解决方案使用一个每天每小时都有一个记录的表。我设法在 where 子句中使用 UNION 和一些令人费解的大小写逻辑获得了我正在寻找的结果,但我希望有人有更优雅的解决方案。
Lam*_*mak 20
对于 SQL Server 2005+,您可以使用循环或递归 CTE 非常轻松地生成这 12 条记录。这是递归 CTE 的示例:
DECLARE @Date DATETIME
SELECT @Date = '20120308 11:00:00'
;WITH Dates AS
(
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,@Date)) [Hour],
DATEADD(HOUR,-1,@Date) [Date], 1 Num
UNION ALL
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,[Date])),
DATEADD(HOUR,-1,[Date]), Num+1
FROM Dates
WHERE Num <= 11
)
SELECT [Hour], [Date]
FROM Dates
Run Code Online (Sandbox Code Playgroud)
然后你只需要将它与你的事件表连接起来。
Hen*_*Lee 10
Tally 表可用于这样的事情。他们可以非常高效。创建下面的计数表。我为您的示例创建了只有 24 行的计数表,但您可以创建它以适应其他目的。
SELECT TOP 24
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
Run Code Online (Sandbox Code Playgroud)
我假设您的表名为 dbo.tblEvents,请运行下面的查询。我相信这就是你要找的:
SELECT t.n, count(e.EventTime)
FROM dbo.Tally t
LEFT JOIN dbo.tblEvent e on t.n = datepart(hh, e.EventTime)
GROUP BY t.n
ORDER BY t.n
Run Code Online (Sandbox Code Playgroud)
我相信归功于以下链接,我相信这是我第一次遇到的地方:
http://www.sqlservercentral.com/articles/T-SQL/62867/
http://www.sqlservercentral.com/articles/T-SQL/74118/
首先,我为我上次发表评论后回复的延迟表示歉意。
该主题出现在评论中,因为行数较少,因此使用递归 CTE(从这里开始的 rCTE)运行速度足够快。虽然看起来是这样,但事实并非如此。
建立理货表和理货功能
在开始测试之前,我们需要使用适当的聚集索引和 Itzik Ben-Gan 风格的 Tally 函数构建一个物理 Tally 表。我们还将在 TempDB 中完成所有这些工作,这样我们就不会意外丢失任何人的好东西。
这是构建 Tally Table 的代码和我当前的 Itzik 精彩代码的生产版本。
--===== Do this in a nice, safe place that everyone has
USE tempdb
;
--===== Create/Recreate a Physical Tally Table
IF OBJECT_ID('dbo.Tally','U') IS NOT NULL
DROP TABLE dbo.Tally
;
-- Note that the ISNULL makes a NOT NULL column
SELECT TOP 1000001
N = ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,0)
INTO dbo.Tally
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (N)
;
--===== Create/Recreate a Tally Function
IF OBJECT_ID('dbo.fnTally','IF') IS NOT NULL
DROP FUNCTION dbo.fnTally
;
GO
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Purpose:
Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.
As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.
Usage:
--===== Syntax example (Returns BIGINT)
SELECT t.N
FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;
Notes:
1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
Refer to the following URLs for how it works and introduction for how it replaces certain loops.
http://www.sqlservercentral.com/articles/T-SQL/62867/
http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
will cause the sequence to start at 1.
3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
5. If @MaxN is negative or NULL, a "TOP" error will be returned.
6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
that many values, you should consider using a different tool. ;-)
7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending
sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.
DECLARE @MaxN BIGINT;
SELECT @MaxN = 1000;
SELECT DescendingN = @MaxN-N+1
FROM dbo.fnTally(1,@MaxN);
8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
Revision History:
Rev 00 - Unknown - Jeff Moden
- Initial creation with error handling for @MaxN.
Rev 01 - 09 Feb 2013 - Jeff Moden
- Modified to start at 0 or 1.
Rev 02 - 16 May 2013 - Jeff Moden
- Removed error handling for @MaxN because of exceptional cases.
Rev 03 - 22 Apr 2015 - Jeff Moden
- Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
(@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1) --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows
SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
UNION ALL
SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;
GO
Run Code Online (Sandbox Code Playgroud)
顺便说一下……注意,它构建了一百万零一行的 Tally 表,并在大约一秒钟左右的时间内向它添加了一个聚集索引。用 rCTE 试试看,看看需要多长时间!;-)
建立一些测试数据
我们还需要一些测试数据。是的,我同意我们将要测试的所有函数,包括 rCTE,仅在 12 行的毫秒或更少时间内运行,但这是很多人落入的陷阱。我们稍后会更多地讨论这个陷阱,但现在,让我们模拟调用每个函数 40,000 次,这大约是我商店中的某些函数在一天 8 小时内被调用的次数。试想一下,在大型在线零售业务中,此类函数可能会被调用多少次。
所以,这里是用随机日期构建 40,000 行的代码,每行都有一个仅用于跟踪目的的行号。我没有花时间把时间整整几个小时,因为这在这里无关紧要。
--===== Do this in a nice, safe place that everyone has
USE tempdb
;
--===== Create/Recreate a Test Date table
IF OBJECT_ID('dbo.TestDate','U') IS NOT NULL
DROP TABLE dbo.TestDate
;
DECLARE @StartDate DATETIME
,@EndDate DATETIME
,@Rows INT
;
SELECT @StartDate = '2010' --Inclusive
,@EndDate = '2020' --Exclusive
,@Rows = 40000 --Enough to simulate an 8 hour day where I work
;
SELECT RowNum = IDENTITY(INT,1,1)
,SomeDateTime = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartDate,@EndDate)+@StartDate
INTO dbo.TestDate
FROM dbo.fnTally(1,@Rows)
;
Run Code Online (Sandbox Code Playgroud)
构建一些功能来做 12 行小时的事情
接下来,我将 rCTE 代码转换为一个函数并创建了其他 3 个函数。它们都被创建为高性能 iTVF(内联表值函数)。您总是可以看出,因为 iTVF 中从来没有像标量或 mTVF(多语句表值函数)那样的 BEGIN。
这是构建这 4 个函数的代码......我以它们使用的方法命名它们,而不是它们所做的只是为了更容易识别它们。
--===== CREATE THE iTVFs
--===== Do this in a nice, safe place that everyone has
USE tempdb
;
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.OriginalrCTE','IF') IS NOT NULL
DROP FUNCTION dbo.OriginalrCTE
;
GO
CREATE FUNCTION dbo.OriginalrCTE
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH Dates AS
(
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,@Date)) [Hour],
DATEADD(HOUR,-1,@Date) [Date], 1 Num
UNION ALL
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,[Date])),
DATEADD(HOUR,-1,[Date]), Num+1
FROM Dates
WHERE Num <= 11
)
SELECT [Hour], [Date]
FROM Dates
GO
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.MicroTally','IF') IS NOT NULL
DROP FUNCTION dbo.MicroTally
;
GO
CREATE FUNCTION dbo.MicroTally
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT [Hour] = DATEPART(HOUR,DATEADD(HOUR,t.N,@Date))
,[DATE] = DATEADD(HOUR,t.N,@Date)
FROM (VALUES (-1),(-2),(-3),(-4),(-5),(-6),(-7),(-8),(-9),(-10),(-11),(-12))t(N)
;
GO
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.PhysicalTally','IF') IS NOT NULL
DROP FUNCTION dbo.PhysicalTally
;
GO
CREATE FUNCTION dbo.PhysicalTally
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT [Hour] = DATEPART(HOUR,DATEADD(HOUR,-t.N,@Date))
,[DATE] = DATEADD(HOUR,-t.N,@Date)
FROM dbo.Tally t
WHERE N BETWEEN 1 AND 12
;
GO
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.TallyFunction','IF') IS NOT NULL
DROP FUNCTION dbo.TallyFunction
;
GO
CREATE FUNCTION dbo.TallyFunction
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT [Hour] = DATEPART(HOUR,DATEADD(HOUR,-t.N,@Date))
,[DATE] = DATEADD(HOUR,-t.N,@Date)
FROM dbo.fnTally(1,12) t
;
GO
Run Code Online (Sandbox Code Playgroud)
构建测试线束以测试功能
最后但并非最不重要的是,我们需要一个测试工具。我进行基线检查,然后以相同的方式测试每个功能。
这是测试工具的代码......
PRINT '--========== Baseline Select =================================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = RowNum
,@Date = SomeDateTime
FROM dbo.TestDate
CROSS APPLY dbo.fnTally(1,12);
SET STATISTICS TIME,IO OFF;
GO
PRINT '--========== Orginal Recursive CTE ===========================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.OriginalrCTE(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
PRINT '--========== Dedicated Micro-Tally Table =====================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.MicroTally(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
PRINT'--========== Physical Tally Table =============================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.PhysicalTally(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
PRINT'--========== Tally Function ===================================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.TallyFunction(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
Run Code Online (Sandbox Code Playgroud)
在上面的测试工具中需要注意的一件事是我将所有输出分流到“一次性”变量中。那是为了尽量保持性能测量结果的纯净,没有任何输出到磁盘或屏幕倾斜结果。
关于设置统计的注意事项
此外,对于潜在的测试人员,请注意......在测试标量或 mTVF 函数时,您不得使用 SET STATISTICS。它只能安全地用于本测试中的 iTVF 功能。SET STATISTICS 已被证明可以使 SCALAR 函数的运行速度比没有它的实际运行速度慢数百倍。是的,我正在尝试倾斜另一个风车,但这将是一个完整的文章长度的帖子,我没有时间这样做。我在 SQLServerCentral.com 上有一篇文章谈到了所有这些,但在这里发布链接是没有意义的,因为有人会因此而变形。
测试结果
因此,这是我在具有 6GB RAM 的小型 i5 笔记本电脑上运行测试工具时的测试结果。
--========== Baseline Select =================================
Table 'Worktable'. Scan count 1, logical reads 82309, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 203 ms, elapsed time = 206 ms.
--========== Orginal Recursive CTE ===========================
Table 'Worktable'. Scan count 40001, logical reads 2960000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4258 ms, elapsed time = 4415 ms.
--========== Dedicated Micro-Tally Table =====================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 234 ms, elapsed time = 235 ms.
--========== Physical Tally Table =============================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 252 ms.
--========== Tally Function ===================================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 253 ms.
Run Code Online (Sandbox Code Playgroud)
“BASELINE SELECT”只选择数据(每行创建 12 次以模拟相同的返回量),大约 1/5 秒就出现了。其他一切都在大约四分之一秒内出现。好吧,除了该死的 rCTE 功能之外的所有东西。它花费了 4 又 1/4 秒或 16 倍的时间(慢了 1,600%)。
看看逻辑读取(内存 IO)... rCTE 消耗了惊人的 2,960,000(几乎 300 万次读取),而其他函数仅消耗了大约 82,100。这意味着 rCTE 消耗的内存 IO 是任何其他函数的 34.3 倍以上。
结束思考
让我们总结一下。用于执行此“小型”12 行操作的 rCTE 方法使用的 CPU(和持续时间)比任何其他函数多 16 倍(1,600%)和 34.3 倍(3,430%)的内存 IO。
呵呵……我知道你在想什么。“大不了!这只是一个功能。”
是的,同意,但你还有多少其他功能?除了函数之外,你还有多少其他地方?您是否有任何一种每次运行超过 12 行的工作?并且,是否有可能某个方法陷入困境的人可能会复制该 rCTE 代码以获得更大的东西?
好吧,是时候直言了。人们仅仅因为假设的行数或使用量有限而证明性能受到挑战的代码是完全没有意义的。除了当你花数百万美元购买 MPP 盒子时(更不用说重写代码以使其在这样的机器上工作的费用),你不能买一台运行你的代码快 16 倍的机器(SSD 赢了)也不要这样做......当我们测试它时,所有这些东西都在高速内存中)。性能在代码中。良好的性能源于良好的代码。
你能想象如果你所有的代码运行速度“仅仅”快 16 倍吗?
永远不要为低行数甚至低使用率的不良或性能挑战代码辩护。如果你这样做了,你可能不得不借用我被指控倾斜的风车之一,以保持你的 CPU 和磁盘足够凉爽。;-)
关于“TALLY”这个词的一句话
是的……我同意。从语义上讲,Tally Table 包含数字,而不是“tally”。在我关于这个主题的原始文章中(它不是关于该技术的原始文章,但它是我的第一篇文章),我称它为“Tally”不是因为它包含什么,而是因为它的作用......用于“计数”而不是循环和“计数”某物是“计数”某物。;-) 随便你怎么称呼它... 数字表、理货表、序列表等等。我不在乎。对我来说,“Tally”的含义更丰富,而且作为一个优秀的懒惰 DBA,它只包含 5 个字母(2 个相同)而不是 7 个,而且对大多数人来说更容易说出来。它也是“单一的”,它遵循我的表格命名约定。;-) 它' 这也是包含 60 年代一本书中一页的文章所称的。我将始终将其称为“理货表”,您仍然会知道我或其他人的意思。我也避免像瘟疫一样使用匈牙利表示法,而是将函数称为“fnTally”,这样我就可以说“好吧,如果你使用了我向你展示的 eff-en Tally 函数,你就不会有性能问题”而它实际上不是一个违反人力资源。;-) 没有它实际上是违反人力资源。;-) 没有它实际上是违反人力资源。;-)
我更关心的是人们学会正确使用它,而不是诉诸于诸如性能挑战 rCTE 和其他形式的隐藏 RBAR 之类的东西。
| 归档时间: |
|
| 查看次数: |
19696 次 |
| 最近记录: |