Mat*_*don 6 normalization sql-server ssis sql-server-2014
我有一个源表,它看起来基本上是这样的:
实际的表有类似于 500 个编号的列(并没有真正计算它们 - 有各种各样的编号为 1-7 的字段,然后是另一个编号为 1-25,乘以 7 的字段)每个工作日(不,这不是我的设计) ,目前大约有 38,600 行(每周增加)。
我有一个 SSIS 包,它试图标准化这些数据......目前看起来像这样:

每个“源”从同一个源表中选择一组编号的列,UNION ALL 组件将 7 个源合并为一个,从而产生大约 258,900 行。
工作流的其余部分添加一些计算列,查找代理键(例如EmployeeCode用于查找EmployeeId,然后计算日期并用于查找 a TimeId),然后“修改”的行得到更新和“新的”被插入到规范化表中;未更改的行最终无处可去。
有没有更好的方法(例如减轻内存压力)来规范化源数据?
Han*_*non 11
没有完整的表定义,很难提供完美的答案。但是,为了在有限的重现中显示差异,使用非常少量的数据,我创建了以下测试台:
IF OBJECT_ID('tempdb..#src') IS NOT NULL
DROP TABLE #src;
CREATE TABLE #src
(
EmployeeCode INT NOT NULL
, WeekStartDate DATE NOT NULL
, HoursDay1 INT NOT NULL
, HoursDay2 INT NOT NULL
, HoursDay3 INT NOT NULL
, HoursDay4 INT NOT NULL
, HoursDay5 INT NOT NULL
, HoursDay6 INT NOT NULL
, HoursDay7 INT NOT NULL
, Widget1Day1 INT NOT NULL
, Widget1Day2 INT NOT NULL
, Widget1Day3 INT NOT NULL
, Widget1Day4 INT NOT NULL
, Widget1Day5 INT NOT NULL
, Widget1Day6 INT NOT NULL
, Widget1Day7 INT NOT NULL
, Widget2Day1 INT NOT NULL
, Widget2Day2 INT NOT NULL
, Widget2Day3 INT NOT NULL
, Widget2Day4 INT NOT NULL
, Widget2Day5 INT NOT NULL
, Widget2Day6 INT NOT NULL
, Widget2Day7 INT NOT NULL
, PRIMARY KEY CLUSTERED (WeekStartDate, EmployeeCode)
);
GO
INSERT INTO #src (EmployeeCode, WeekStartDate
, HoursDay1, HoursDay2, HoursDay3, HoursDay4, HoursDay5, HoursDay6, HoursDay7
, Widget1Day1, Widget1Day2, Widget1Day3, Widget1Day4, Widget1Day5, Widget1Day6, Widget1Day7
, Widget2Day1, Widget2Day2, Widget2Day3, Widget2Day4, Widget2Day5, Widget2Day6, Widget2Day7
)
VALUES (1, '2016-09-18'
, 0, 8, 8, 8, 8, 8, 0
, ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID()))
, ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID())), ABS(CHECKSUM(NEWID()))
);
Run Code Online (Sandbox Code Playgroud)
下面我们比较两个查询;第一个使用CROSS APPLY我在SQLServerScience.com 上详细介绍的方法,第二个使用该UNION ALL方法。
SELECT s.WeekStartDate
, s.EmployeeCode
, ItemsByDay.DayOfWeekName
, ItemsByDay.HoursWorked
, ItemsByDay.Widget1
, ItemsByDay.Widget2
FROM #src s
CROSS APPLY (VALUES ('Sunday', HoursDay1, Widget1Day1, Widget2Day1)
, ('Monday', HoursDay2, Widget1Day2, Widget2Day2)
, ('Tuesday', HoursDay3, Widget1Day3, Widget2Day3)
, ('Wednesday', HoursDay4, Widget1Day4, Widget2Day4)
, ('Thursday', HoursDay5, Widget1Day5, Widget2Day5)
, ('Friday', HoursDay6, Widget1Day6, Widget2Day6)
, ('Saturday', HoursDay7, Widget1Day7, Widget2Day7)
) ItemsByDay(DayOfWeekName, HoursWorked, Widget1, Widget2);
SELECT s.EmployeeCode
, s.WeekStartDate
, 'Sunday'
, s.HoursDay1
, s.Widget1Day1
, s.Widget2Day1
FROM #src s
UNION ALL
SELECT s.EmployeeCode
, s.WeekStartDate
, 'Monday'
, s.HoursDay2
, s.Widget1Day2
, s.Widget2Day2
FROM #src s
UNION ALL
SELECT s.EmployeeCode
, s.WeekStartDate
, 'Tuesday'
, s.HoursDay3
, s.Widget1Day3
, s.Widget2Day3
FROM #src s
UNION ALL
SELECT s.EmployeeCode
, s.WeekStartDate
, 'Wednesday'
, s.HoursDay4
, s.Widget1Day4
, s.Widget2Day4
FROM #src s
UNION ALL
SELECT s.EmployeeCode
, s.WeekStartDate
, 'Thursday'
, s.HoursDay5
, s.Widget1Day5
, s.Widget2Day5
FROM #src s
UNION ALL
SELECT s.EmployeeCode
, s.WeekStartDate
, 'Friday'
, s.HoursDay6
, s.Widget1Day6
, s.Widget2Day6
FROM #src s
UNION ALL
SELECT s.EmployeeCode
, s.WeekStartDate
, 'Saturday'
, s.HoursDay7
, s.Widget1Day7
, s.Widget2Day7
FROM #src s;
Run Code Online (Sandbox Code Playgroud)
首先要注意的CROSS APPLY是, 更容易查看。这已经让我很高兴了。
让我们检查两个变体的执行计划:
该UNION ALL变体扫描源表 7 次,而CROSS APPLY使用单表扫描。通过使用交叉申请,我们就是#Winning。
让我们添加更多数据:
/* create a table with 2 years worth of week start dates */
IF OBJECT_ID('tempdb..#Weeks') IS NULL
BEGIN
CREATE TABLE #Weeks
(
WeekStart DATE NOT NULL
PRIMARY KEY CLUSTERED
);
INSERT INTO #Weeks (WeekStart)
SELECT TOP(104) DATEADD(DAY, (ROW_NUMBER() OVER (ORDER BY o1.name) - 1) * 7, '2016-01-03')
FROM sys.objects o1
CROSS JOIN sys.objects o2;
END
/* remove the single row from the source table we inserted above */
TRUNCATE TABLE #src;
/* insert a load of rows into the #src table */
INSERT INTO #src (EmployeeCode, WeekStartDate, HoursDay1, HoursDay2, HoursDay3, HoursDay4, HoursDay5, HoursDay6, HoursDay7)
SELECT ABS(CHECKSUM(NEWID()))
, w.WeekStart
, 0, 8, 8, 8, 8, 8, 0
FROM #Weeks w
CROSS JOIN sys.objects o1;
Run Code Online (Sandbox Code Playgroud)
在我的系统上,上面的代码生成了大约 85,000 行。这两个查询的计划现在是:
SQL Sentry Plan Explorer显示以下摘要信息,这些信息非常宝贵:
这表示 CPU 被 更密集地使用CROSS APPLY,但是UNION ALL变体使用的 I/O 是 7 倍。
| 归档时间: |
|
| 查看次数: |
166 次 |
| 最近记录: |