Zap*_*ica 4 sql-server filegroups partitioning
我有一个 SQL Server 数据库,其中包含两个表 -Acks
和Logs
.
这两个表在逻辑上是相关的,但不是关系数据库的方式。基本上,传入的每条消息都会保存在Log
表中,如果我们的服务器确认了它,那么该确认就会存储在Ack
表中。
我们每天存储大约 500 万个确认和 300 万个日志。我试图在每日边界上对这两个表进行分区,以便我们可以轻松地从表中删除旧分区,并提高查询性能。
我之前没有做过表分区,所以我一直在阅读一些在线教程,但是我被一件事困住了。我遵循的所有教程似乎都手动添加文件组并手动添加边界。
我希望 SQL Server 每天都以某种方式执行此操作,这就是我的问题所在。我需要它来为第二天创建新的文件组,例如每天 22:00。然后在 24:00 插入应该开始填满新的一天的分区。
谁能指出我如何实现这一目标的正确方向?综合教程或一些好的旧建议也可以。
我的第二个问题:我可以以某种方式将相同的分区函数应用于两个不同的表吗?
他们都有一个datetime(2)
我想在其上分区的列,并且将应用相同的规则。
那如何适应我的文件组?我一天需要一个文件组吗?每个表在该文件组中都有一个文件,还是两个表都保存到文件组中的同一个文件中?
我是否必须为每个文件组创建一个.mdf
和.ldf
?还是整个数据库还有一个日志文件?
从 SQL Server 2008 SP2 和 SQL Server 2008 R2 SP1 开始,支持15,000 个分区,所以老实说,你真的不需要动态地做那么多。无需复杂的日常流程(动态添加文件组和边界)可能会失败,只需从现在到 2020 年预先创建分区,您就完全在自己的限制范围内,并且非常适合未来。
您可以将所有分区分配给一个文件组(确定不一定是一个很好的模式),或者在有限数量之间进行循环。换句话说,没有技术需要每天有一个文件组,例如
-- Assign all to one filegroup; ok not necessarily great
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1] )
-- Or round-robin
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1], [FG2], [FG3], [FG4], [FG5], [FG6], [FG7], [FG1] ... etc )
Run Code Online (Sandbox Code Playgroud)
显然使用 Excel 或某些工具为您生成脚本 - 无需键入它们:)
使用 DMV sys.partition_range_values和元数据函数$PARTITION 计算出有关哪些数据在哪里的信息。创建每日作业以切换并截断您最旧的分区。我认为这比每日添加的风险更低。
警告!!请仔细阅读白皮书,因为这需要启用,并且这种方法存在一些问题(例如,不支持在具有 1,000 个以上分区的表上创建和重建非对齐索引)。如果您感觉规避风险,1,000 个分区的标准限制仍然允许您预分配不到 3 年。
由于您确实想按 DATE 而不是 DATETIME2 进行分区,请考虑计算列。不过,我可能想先对此进行性能测试。
还有一个关于 codeplex 的工具(SQL Server Partition Management)可能值得一看,尽管我还没有使用过它。
要回答您的其他问题,数据库应该只有一个日志文件。将其他文件添加为 .ndf 而不是 .mdf。要在相同的表上使用相同的分区方案(不是功能),只需使用相同的方案创建它们,它们会将数据分成文件组下的文件,例如
CREATE TABLE dbo.yourTable ( ...
CONSTRAINT PK_yourTable PRIMARY KEY ( rowId, someDate )
) ON ps_test(someDate)
Run Code Online (Sandbox Code Playgroud)
好的,这将是一个很长的答案,但我已经完成了一个关于这样的事情如何工作的演示。它确实提醒我,分区切换的好处在于它作为仅元数据的操作是即时的。需要明确的是,这是一个展示原理和一些示例“操作方法”代码的演示,它不是生产质量。在开发或测试环境中运行之前,请完成它并确保您理解它。您将需要大约 200MB 的空间。
------------------------------------------------------------------------------------------------
-- Setup START
-- Demo runs on my laptop in < 1 minute (ok on SSD)
-- You'll need 200MB space
------------------------------------------------------------------------------------------------
USE master
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
DROP DATABASE tooManyPartitionsTest
GO
CREATE DATABASE tooManyPartitionsTest
GO
ALTER DATABASE tooManyPartitionsTest SET RECOVERY SIMPLE
GO
-- Add 7 filegroups with 4 files each
-- Add 365 files and filegroup
DECLARE @fg INT = 0, @f INT = 0, @sql NVARCHAR(MAX)
WHILE @fg < 7
BEGIN
SET @fg += 1
SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILEGROUP tooManyPartitionsTestFg' + CAST( @fg AS VARCHAR(5) )
-- Add the filegroup
PRINT @sql
EXEC(@sql)
-- Initialise
SET @f = 0
WHILE @f < 4
BEGIN
SET @f += 1
--!!WARNING!! DON'T USE THESE SETTINGS IN PRODUCTION. 3MB starting size and 1MB filegrowth are just for demo - would be extremely painful for live data
SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILE ( NAME = N''tooManyPartitionsTestFile@f_@fg'', FILENAME = N''s:\temp\tooManyPartitionsTestFile@f_@fg.ndf'', SIZE = 3MB, FILEGROWTH = 1MB ) TO FILEGROUP [tooManyPartitionsTestFg@fg]'
SET @sql = REPLACE ( @sql, '@fg', @fg )
SET @sql = REPLACE ( @sql, '@f', @f )
-- Add the file
PRINT @sql
EXEC(@sql)
END
END
GO
USE tooManyPartitionsTest
GO
SELECT * FROM sys.filegroups
SELECT * FROM sys.database_files
GO
-- Generate partition function with ~3 years worth of daily partitions from 1 Jan 2014.
DECLARE @bigString NVARCHAR(MAX) = ''
;WITH cte AS (
SELECT CAST( '30 Apr 2014' AS DATE ) testDate
UNION ALL
SELECT DATEADD( day, 1, testDate )
FROM cte
WHERE testDate < '31 Dec 2016'
)
SELECT @bigString += ',' + QUOTENAME( CONVERT ( VARCHAR, testDate, 106 ), '''' )
FROM cte
OPTION ( MAXRECURSION 1100 )
SELECT @bigString = 'CREATE PARTITION FUNCTION pf_test (DATE) AS RANGE RIGHT FOR VALUES ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
SELECT @bigString bs
-- Create the partition function
PRINT @bigString
EXEC ( @bigString )
GO
/*
-- Look at the boundaries
SELECT *
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
GO
*/
DECLARE @bigString NVARCHAR(MAX) = ''
;WITH cte AS (
SELECT ROW_NUMBER() OVER( ORDER BY boundary_id ) rn
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
UNION ALL
SELECT 1 -- additional row required for fg
)
SELECT @bigString += ',' + '[tooManyPartitionsTestFg' + CAST( ( rn % 7 ) + 1 AS VARCHAR(5) ) + ']'
FROM cte
OPTION ( MAXRECURSION 1100 )
SELECT @bigString = 'CREATE PARTITION SCHEME ps_test AS PARTITION pf_test TO ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
PRINT @bigString
EXEC ( @bigString )
GO
IF OBJECT_ID('dbo.yourLog') IS NULL
CREATE TABLE dbo.yourLog (
logId INT IDENTITY,
someDate DATETIME2 NOT NULL,
someData UNIQUEIDENTIFIER DEFAULT NEWID(),
dateAdded DATETIME DEFAULT GETDATE(),
addedBy VARCHAR(30) DEFAULT SUSER_NAME(),
-- Computed column for partitioning?
partitionDate AS CAST( someDate AS DATE ) PERSISTED,
CONSTRAINT pk_yourLog PRIMARY KEY ( logId, partitionDate ) -- << !!TODO try other way round
) ON [ps_test]( partitionDate )
GO
IF OBJECT_ID('dbo.yourAcks') IS NULL
CREATE TABLE dbo.yourAcks (
ackId INT IDENTITY(100000,1),
logId INT NOT NULL,
partitionDate DATE NOT NULL
CONSTRAINT pk_yourAcks PRIMARY KEY ( ackId, logId, partitionDate )
) ON [ps_test]( partitionDate )
GO
IF OBJECT_ID('dbo.yourLogSwitch') IS NULL
CREATE TABLE dbo.yourLogSwitch (
logId INT IDENTITY,
someDate DATETIME2 NOT NULL,
someData UNIQUEIDENTIFIER DEFAULT NEWID(),
dateAdded DATETIME DEFAULT GETDATE(),
addedBy VARCHAR(30) DEFAULT SUSER_NAME(),
-- Computed column for partitioning?
partitionDate AS CAST( someDate AS DATE ) PERSISTED,
CONSTRAINT pk_yourLogSwitch PRIMARY KEY ( logId, partitionDate )
) ON [ps_test]( partitionDate )
GO
-- Setup END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Data START
------------------------------------------------------------------------------------------------
-- OK load up data for Jan 2014 to today.
DECLARE @startDate DATETIME = '1 Jan 2014', @rand INT
WHILE @startDate < GETDATE()
BEGIN
-- Add between 1 and 10,000 rows to dbo.yourLog for today
SET @rand = RAND() * 10000
;WITH cte AS (
SELECT TOP 10000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.yourLog (someDate)
SELECT TOP(@rand) DATEADD( second, rn % 30000, @startDate )
FROM cte
-- Add most of the Acks
INSERT INTO dbo.yourAcks ( logId, partitionDate )
SELECT TOP 70 PERCENT logId, partitionDate
FROM dbo.yourLog
WHERE partitionDate = @startDate
SET @startDate = DATEADD( day, 1, @startDate )
CHECKPOINT
END
GO
-- Have a look at the data we've loaded
SELECT 'before yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog
SELECT 'before yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks
-- You'll see how pre-May data is initially clumped together
SELECT 'before $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK)
GROUP BY $PARTITION.pf_test( partitionDate )
ORDER BY xMinDate
GO
-- Data END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Maintenance START
------------------------------------------------------------------------------------------------
-- Oh man, we're behind with our switching and truncation.
-- Create a job that sweeps up. Do we get blocking?
-- ALTER TABLE dbo.yourLog SWITCH PARTITION 1 TO dbo.yourLogSwitch PARTITION 1
-- TRUNCATE TABLE dbo.yourLogSwitch
-- Let's pretend we only want to maintain up to 30 days ago
DECLARE @testDate DATE
SET @testDate = DATEADD( day, -30, GETDATE() )
-- Create local fast_forward ( forward-only, read-only ) cursor
DECLARE partitions_cursor CURSOR FAST_FORWARD LOCAL FOR
SELECT boundary_id, CAST( value AS DATE )
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
AND value < @testDate
-- Cursor variables
DECLARE @boundary_id INT, @value DATE, @sql NVARCHAR(MAX)
OPEN partitions_cursor
FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
WHILE @@fetch_status = 0
BEGIN
-- Switch out and truncate old partition
SET @sql = 'ALTER TABLE dbo.yourLog SWITCH PARTITION ' + CAST( @boundary_id AS VARCHAR(5) ) + ' TO dbo.yourLogSwitch PARTITION ' + CAST( @boundary_id AS VARCHAR(5) )
PRINT @sql
EXEC(@sql)
-- You could move the data elsewhere from here or just empty it out
TRUNCATE TABLE dbo.yourLogSwitch
--!!TODO yourAcks table
FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
END
CLOSE partitions_cursor
DEALLOCATE partitions_cursor
GO
-- Maintenance END
------------------------------------------------------------------------------------------------
-- Have a look at the data we've maintained
SELECT 'after yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog
SELECT 'after yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks
-- You'll see how pre-May data is initially clumped together
SELECT 'after $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK)
GROUP BY $PARTITION.pf_test( partitionDate )
ORDER BY xMinDate
-- Remember, date must always be part of query now to get partition elimination
SELECT *
FROM dbo.yourLog
WHERE partitionDate = '1 August 2014'
GO
-- Cleanup
USE master
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
DROP DATABASE tooManyPartitionsTest
GO
Run Code Online (Sandbox Code Playgroud)
要添加新分区,请使用SPLIT RANGE
. 假设您有以下分区:
CREATE PARTITION FUNCTION pfTest(int) AS RANGE LEFT FOR VALUES (10, 20, 30);
CREATE PARTITION SCHEME psTest AS PARTITION pfTest TO ([GRP1], [GRP2], [GRP3]);
Run Code Online (Sandbox Code Playgroud)
..您可以通过将最后一个范围从(30到无穷大)“拆分”到(30-39)和(40到无穷大)来添加新分区。语法如下:
ALTER PARTITION FUNCTION pfTest() SPLIT RANGE (40);
ALTER PARTITION SCHEME psTest NEXT USED [GRP4];
Run Code Online (Sandbox Code Playgroud)
除了生成动态 SQL 并按计划运行(例如在 SQL Server 代理作业中)之外,我不知道还有什么其他方法可以自动执行此操作。
分区函数可以应用于任意数量的表。每个表都可以放置在一个分区方案上,该分区方案又连接到一个分区函数。
--- Create the partition function:
CREATE PARTITION FUNCTION fn_part_left(int)
AS RANGE LEFT FOR VALUES (100, 110, 120, 130);
--- Create the partition scheme:
CREATE PARTITION SCHEME ps_part_left AS
PARTITION fn_part_left TO
([GROUP_A], [GROUP_B], [GROUP_C], [GROUP_A], [GROUP_B]);
--- Create the table
CREATE TABLE myTable (
someColumn int NOT NULL,
....
) ON [ps_part_left](someColumn);
Run Code Online (Sandbox Code Playgroud)
我在示例中使用“int”作为数据类型,但 datetime2 也可以工作。
如果需要,您可以将多个分区放在同一文件组中。在这里,您必须对如何在不同分区之间分配负载进行一些规划,这样您就不会将所有 I/O 负载放在单个文件组上。
一个文件组可以包含一个或多个 .mdf 文件。
数据库可以有一个或多个 .ldf 文件。