每天动态表分区

Zap*_*ica 4 sql-server filegroups partitioning

我有一个 SQL Server 数据库,其中包含两个表 -AcksLogs.

这两个表在逻辑上是相关的,但不是关系数据库的方式。基本上,传入的每条消息都会保存在Log表中,如果我们的服务器确认了它,那么该确认就会存储在Ack表中。

我们每天存储大约 500 万个确认和 300 万个日志。我试图在每日边界上对这两个表进行分区,以便我们可以轻松地从表中删除旧分区,并提高查询性能。

我之前没有做过表分区,所以我一直在阅读一些在线教程,但是我被一件事困住了。我遵循的所有教程似乎都手动添加文件组并手动添加边界。

我希望 SQL Server 每天都以某种方式执行此操作,这就是我的问题所在。我需要它来为第二天创建新的文件组,例如每天 22:00。然后在 24:00 插入应该开始填满新的一天的分区。

谁能指出我如何实现这一目标的正确方向?综合教程或一些好的旧建议也可以。

我的第二个问题:我可以以某种方式将相同的分区函数应用于两个不同的表吗?

他们都有一个datetime(2)我想在其上分区的列,并且将应用相同的规则。

那如何适应我的文件组?我一天需要一个文件组吗?每个表在该文件组中都有一个文件,还是两个表都保存到文件组中的同一个文件中?

我是否必须为每个文件组创建一个.mdf.ldf?还是整个数据库还有一个日志文件?

wBo*_*Bob 6

从 SQL Server 2008 SP2 和 SQL Server 2008 R2 SP1 开始,支持15,000 个分区,所以老实说,你真的不需要动态地做那么多。无需复杂的日常流程(动态添加文件组和边界)可能会失败,只需从现在到 2020 年预先创建分区,您就完全在自己的限制范围内,并且非常适合未来。

可以将所有分区分配给一个文件组(确定不一定是一个很好的模式),或者在有限数量之间进行循环。换句话说,没有技术需要每天有一个文件组,例如

-- Assign all to one filegroup; ok not necessarily great
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1] )

-- Or round-robin
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1], [FG2], [FG3], [FG4], [FG5], [FG6], [FG7], [FG1] ... etc )
Run Code Online (Sandbox Code Playgroud)

显然使用 Excel 或某些工具为您生成脚本 - 无需键入它们:)

使用 DMV sys.partition_range_values和元数据函数$PARTITION 计算出有关哪些数据在哪里的信息。创建每日作业以切换并截断您最旧的分区。我认为这比每日添加的风险更低。

警告!!请仔细阅读白皮书,因为这需要启用,并且这种方法存在一些问题(例如,不支持在具有 1,000 个以上分区的表上创建和重建非对齐索引)。如果您感觉规避风险,1,000 个分区的标准限制仍然允许您预分配不到 3 年。

由于您确实想按 DATE 而不是 DATETIME2 进行分区,请考虑计算列。不过,我可能想先对此进行性能测试。

还有一个关于 codeplex 的工具(SQL Server Partition Management)可能值得一看,尽管我还没有使用过它。

要回答您的其他问题,数据库应该只有一个日志文件。将其他文件添加为 .ndf 而不是 .mdf。要在相同的表上使用相同的分区方案(不是功能),只需使用相同的方案创建它们,它们会将数据分成文件组下的文件,例如

CREATE TABLE dbo.yourTable (    ...

    CONSTRAINT PK_yourTable PRIMARY KEY ( rowId, someDate )

    ) ON ps_test(someDate)
Run Code Online (Sandbox Code Playgroud)

好的,这将是一个很长的答案,但我已经完成了一个关于这样的事情如何工作的演示。它确实提醒我,分区切换的好处在于它作为仅元数据的操作是即时的。需要明确的是,这是一个展示原理和一些示例“操作方法”代码的演示,它不是生产质量。在开发或测试环境中运行之前,请完成它并确保您理解它。您将需要大约 200MB 的空间。

------------------------------------------------------------------------------------------------
-- Setup START
-- Demo runs on my laptop in < 1 minute (ok on SSD)
-- You'll need 200MB space
------------------------------------------------------------------------------------------------

USE master
GO

IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    DROP DATABASE tooManyPartitionsTest 
GO
CREATE DATABASE tooManyPartitionsTest
GO

ALTER DATABASE tooManyPartitionsTest SET RECOVERY SIMPLE
GO

-- Add 7 filegroups with 4 files each
-- Add 365 files and filegroup
DECLARE @fg INT = 0, @f INT = 0, @sql NVARCHAR(MAX)

WHILE @fg < 7
BEGIN

    SET @fg += 1
    SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILEGROUP tooManyPartitionsTestFg' + CAST( @fg AS VARCHAR(5) )

    -- Add the filegroup
    PRINT @sql
    EXEC(@sql)


    -- Initialise
    SET @f = 0

    WHILE @f < 4
    BEGIN

        SET @f += 1
        --!!WARNING!! DON'T USE THESE SETTINGS IN PRODUCTION.  3MB starting size and 1MB filegrowth are just for demo - would be extremely painful for live data
        SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILE ( NAME = N''tooManyPartitionsTestFile@f_@fg'', FILENAME = N''s:\temp\tooManyPartitionsTestFile@f_@fg.ndf'', SIZE = 3MB, FILEGROWTH = 1MB ) TO FILEGROUP [tooManyPartitionsTestFg@fg]'
        SET @sql = REPLACE ( @sql, '@fg', @fg )
        SET @sql = REPLACE ( @sql, '@f', @f )

        -- Add the file
        PRINT @sql
        EXEC(@sql)

    END

END
GO


USE tooManyPartitionsTest
GO

SELECT * FROM sys.filegroups
SELECT * FROM sys.database_files 
GO

-- Generate partition function with ~3 years worth of daily partitions from 1 Jan 2014.
DECLARE @bigString NVARCHAR(MAX) = ''

;WITH cte AS (
SELECT CAST( '30 Apr 2014' AS DATE ) testDate
UNION ALL
SELECT DATEADD( day, 1, testDate )
FROM cte
WHERE testDate < '31 Dec 2016'
)
SELECT @bigString += ',' + QUOTENAME( CONVERT ( VARCHAR, testDate, 106 ), '''' )
FROM cte
OPTION ( MAXRECURSION 1100 )

SELECT @bigString = 'CREATE PARTITION FUNCTION pf_test (DATE) AS RANGE RIGHT FOR VALUES ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
SELECT @bigString bs

-- Create the partition function
PRINT @bigString
EXEC ( @bigString )
GO

/*
-- Look at the boundaries
SELECT *
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
GO
*/

DECLARE @bigString NVARCHAR(MAX) = ''

;WITH cte AS (
SELECT ROW_NUMBER() OVER( ORDER BY boundary_id ) rn
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
UNION ALL 
SELECT 1    -- additional row required for fg
)
SELECT @bigString += ',' + '[tooManyPartitionsTestFg' + CAST( ( rn % 7 ) + 1 AS VARCHAR(5) ) + ']'
FROM cte
OPTION ( MAXRECURSION 1100 )

SELECT @bigString = 'CREATE PARTITION SCHEME ps_test AS PARTITION pf_test TO ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
PRINT @bigString
EXEC ( @bigString )
GO




IF OBJECT_ID('dbo.yourLog') IS NULL
CREATE TABLE dbo.yourLog ( 
    logId       INT IDENTITY,
    someDate    DATETIME2 NOT NULL,
    someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
    dateAdded   DATETIME DEFAULT GETDATE(), 
    addedBy     VARCHAR(30) DEFAULT SUSER_NAME(), 

    -- Computed column for partitioning?
    partitionDate AS CAST( someDate AS DATE ) PERSISTED,

    CONSTRAINT pk_yourLog PRIMARY KEY ( logId, partitionDate )  -- << !!TODO try other way round

    ) ON [ps_test]( partitionDate )
GO


IF OBJECT_ID('dbo.yourAcks') IS NULL
CREATE TABLE dbo.yourAcks ( 
    ackId           INT IDENTITY(100000,1),
    logId           INT NOT NULL,
    partitionDate   DATE NOT NULL

    CONSTRAINT pk_yourAcks PRIMARY KEY ( ackId, logId, partitionDate )  

    ) ON [ps_test]( partitionDate )
GO



IF OBJECT_ID('dbo.yourLogSwitch') IS NULL
CREATE TABLE dbo.yourLogSwitch ( 
    logId       INT IDENTITY,
    someDate    DATETIME2 NOT NULL,
    someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
    dateAdded   DATETIME DEFAULT GETDATE(), 
    addedBy     VARCHAR(30) DEFAULT SUSER_NAME(), 

    -- Computed column for partitioning?
    partitionDate AS CAST( someDate AS DATE ) PERSISTED,

    CONSTRAINT pk_yourLogSwitch PRIMARY KEY ( logId, partitionDate )

    ) ON [ps_test]( partitionDate )
GO
-- Setup END
------------------------------------------------------------------------------------------------



------------------------------------------------------------------------------------------------
-- Data START
------------------------------------------------------------------------------------------------

-- OK load up data for Jan 2014 to today.
DECLARE @startDate DATETIME = '1 Jan 2014', @rand INT 

WHILE @startDate < GETDATE()
BEGIN

    -- Add between 1 and 10,000 rows to dbo.yourLog for today
    SET @rand = RAND() * 10000

    ;WITH cte AS (
    SELECT TOP 10000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
    FROM master.sys.columns c1
        CROSS JOIN master.sys.columns c2
        CROSS JOIN master.sys.columns c3
    )
    INSERT INTO dbo.yourLog (someDate)
    SELECT TOP(@rand) DATEADD( second, rn % 30000, @startDate )
    FROM cte

    -- Add most of the Acks
    INSERT INTO dbo.yourAcks ( logId, partitionDate )
    SELECT TOP 70 PERCENT logId, partitionDate
    FROM dbo.yourLog
    WHERE partitionDate = @startDate

    SET @startDate = DATEADD( day, 1, @startDate )

    CHECKPOINT

END
GO

-- Have a look at the data we've loaded
SELECT 'before yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog 
SELECT 'before yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks

-- You'll see how pre-May data is initially clumped together
SELECT 'before $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK) 
GROUP BY $PARTITION.pf_test( partitionDate ) 
ORDER BY xMinDate
GO

-- Data END
------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------
-- Maintenance START
------------------------------------------------------------------------------------------------

-- Oh man, we're behind with our switching and truncation.
-- Create a job that sweeps up.  Do we get blocking?

-- ALTER TABLE dbo.yourLog SWITCH PARTITION 1 TO dbo.yourLogSwitch PARTITION 1
-- TRUNCATE TABLE dbo.yourLogSwitch

-- Let's pretend we only want to maintain up to 30 days ago
DECLARE @testDate DATE
SET @testDate = DATEADD( day, -30, GETDATE() )

-- Create local fast_forward ( forward-only, read-only ) cursor 
DECLARE partitions_cursor CURSOR FAST_FORWARD LOCAL FOR 
SELECT boundary_id, CAST( value AS DATE )
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
  AND value < @testDate

-- Cursor variables
DECLARE @boundary_id INT, @value DATE, @sql NVARCHAR(MAX)

OPEN partitions_cursor

FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
WHILE @@fetch_status = 0
BEGIN

    -- Switch out and truncate old partition
    SET @sql = 'ALTER TABLE dbo.yourLog SWITCH PARTITION ' + CAST( @boundary_id AS VARCHAR(5) ) + ' TO dbo.yourLogSwitch PARTITION ' + CAST( @boundary_id AS VARCHAR(5) )

    PRINT @sql
    EXEC(@sql)

    -- You could move the data elsewhere from here or just empty it out
    TRUNCATE TABLE dbo.yourLogSwitch

    --!!TODO yourAcks table

    FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
END

CLOSE partitions_cursor
DEALLOCATE partitions_cursor
GO

-- Maintenance END
------------------------------------------------------------------------------------------------



-- Have a look at the data we've maintained
SELECT 'after yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog 
SELECT 'after yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks

-- You'll see how pre-May data is initially clumped together
SELECT 'after $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK) 
GROUP BY $PARTITION.pf_test( partitionDate ) 
ORDER BY xMinDate



-- Remember, date must always be part of query now to get partition elimination
SELECT *
FROM dbo.yourLog
WHERE partitionDate = '1 August 2014'
GO


-- Cleanup
USE master
GO

IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
    DROP DATABASE tooManyPartitionsTest 
GO
Run Code Online (Sandbox Code Playgroud)


Dan*_*her 3

要添加新分区,请使用SPLIT RANGE. 假设您有以下分区:

CREATE PARTITION FUNCTION pfTest(int) AS RANGE LEFT FOR VALUES (10, 20, 30);
CREATE PARTITION SCHEME psTest AS PARTITION pfTest TO ([GRP1], [GRP2], [GRP3]);
Run Code Online (Sandbox Code Playgroud)

..您可以通过将最后一个范围从(30到无穷大)“拆分”到(30-39)和(40到无穷大)来添加新分区。语法如下:

ALTER PARTITION FUNCTION pfTest() SPLIT RANGE (40);
ALTER PARTITION SCHEME psTest NEXT USED [GRP4];
Run Code Online (Sandbox Code Playgroud)

除了生成动态 SQL 并按计划运行(例如在 SQL Server 代理作业中)之外,我不知道还有什么其他方法可以自动执行此操作。

分区函数可以应用于任意数量的表。每个表都可以放置在一个分区方案上,该分区方案又连接到一个分区函数。

--- Create the partition function:
CREATE PARTITION FUNCTION fn_part_left(int)
AS RANGE LEFT FOR VALUES (100, 110, 120, 130);

--- Create the partition scheme:
CREATE PARTITION SCHEME ps_part_left AS
PARTITION fn_part_left TO
    ([GROUP_A], [GROUP_B], [GROUP_C], [GROUP_A], [GROUP_B]);

--- Create the table
CREATE TABLE myTable (
    someColumn int NOT NULL,
    ....
) ON [ps_part_left](someColumn);
Run Code Online (Sandbox Code Playgroud)

我在示例中使用“int”作为数据类型,但 datetime2 也可以工作。

如果需要,您可以将多个分区放在同一文件组中。在这里,您必须对如何在不同分区之间分配负载进行一些规划,这样您就不会将所有 I/O 负载放在单个文件组上。

一个文件组可以包含一个或多个 .mdf 文件。

数据库可以有一个或多个 .ldf 文件