更改分区表主键列的数据类型

All*_*hty 8 sql-server ddl sql-server-2012

我目前很幸运有以下情况:

  • 我们有一个分区并使用主键类型的生产数据库 int
  • 多个 (~20) 表,每个表包含 > 100.000.000 行,其中 2 个接近 2.000.000.000 行
  • 这些表都有一个类型int为主键的列,每行增加一
  • 这些表被分配,使用类型的分区的功能int,使用RIGHT(0),所以有效的所有数据是在所述一个和唯一的分区

由于某些表接近int32最大值,我们必须将这些列类型更改为bigint。这是问题开始的地方:

  • 不能修改主键列的类型(这个我可以通过删除主键,修改类型,然后重新创建主键来避免),但是,我在删除主键约束后仍然无法修改列类型,得到以下异常:

    {“对象‘XXX_XXX’依赖于‘ABCD’列。ALTER TABLE ALTER COLUMN ABCD 失败,因为一个或多个对象访问了这一列。”}

据我所知,分区方案(和函数)对表的模式绑定依赖关系阻止了数据类型修改。没有引用这些主键的外键。

我没有看到更新此数据库上的列类型的解决方案,更不用说生产数据库了。我们可以承受一些维护停机时间,比如 +- 60 分钟,但现在不能了。我有哪些选择?

附加信息

  • 我开始相信我唯一的选择是创建具有正确结构的新表并将现有表数据泵入新表中......

  • 表已经根据它们的主键值进行了分区(目前只有 1 个分区(RIGHT(0),未来的分区应该可以用于新值 ( RIGHT(x)),边界x尚未确定)

  • 这些是数据记录表,只有插入,没有更新或删除

  • 除了维护时段外,总是在这些表上执行 DML,每 6-8 周一次,最多持续 90 分钟。

  • 基于主键的索引都是聚集的

  • 这些表上没有非聚集索引

Joh*_*ner 6

First before I provide a solution, I want to confirm your assumption:

As far as I can see there is a schema bound dependency of the partition scheme (and function) to the table preventing the datatype modification.

You are correct; the fact that you have a partition function defined for a datatype of INT is preventing you from altering the data type of your column... Sadly, you're kind of stuck with the current table because there's no way, to my knowledge, to adjust the datatype of a partition function without dropping/recreating it.... which you can't do because you have tables dependent upon it. So basically this is the chicken-and-egg scenario.

I'm not seeing a solution to updating the column type on this database, let alone a production database. We can afford some maintenance downtime, say +- 60 minutes, but no longer. What are my options?

One possible solution to your issue is to take advantage of Partitioned Views. This functionality has been around forever and is often overlooked, but a Partitioned View will provide you with a way to sidestep your INT data limit by adding new data to a different table where the ID column is a BIGINT datatype. This new table should also have a better partition function/scheme underlying it and hopefully will be maintained a little better going forward. Any queries that are pointing to the old table would then be pointed to the Partitioned View and you'll not run into a data limit issue.

Let me explain with an example. The following is a trivial recreation of your current partitioned table, consisting of one partition, as follows:

-- Create your very frustraiting parition scheme
CREATE PARTITION FUNCTION dear_god_why_this_logic (INT)
AS RANGE RIGHT FOR VALUES (0);

-- Scheme to store data on PartitionedData Filegroup
CREATE PARTITION SCHEME dear_god_why_this_scheme
AS PARTITION dear_god_why_this_logic ALL TO ([PRIMARY]);

-- Create Partitioned Table
CREATE TABLE dbo.TestPartitioned
(
    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    VAL CHAR(1)
) ON dear_god_why_this_scheme (ID);

--Populate Table with Data
INSERT INTO dbo.TestPartitioned WITH (TABLOCK) (VAL)
SELECT CHAR(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 128)
FROM master.sys.configurations t1
    CROSS JOIN master.sys.configurations t2
    CROSS JOIN master.sys.configurations t3;
Run Code Online (Sandbox Code Playgroud)

Let's say for argument's sake that I'm getting close to the INT limit (even though I'm obviously not). Because I don't want run out of valid INT values in the ID column, I'm going to create a similar table, but use BIGINT for the ID column instead. This table will be defined as follows:

CREATE PARTITION FUNCTION lets_maintain_this_going_forward (BIGINT)
AS RANGE RIGHT FOR VALUES (0, 500000, 1000000, 1500000, 2000000);

-- Scheme to store data on PartitionedData Filegroup
CREATE PARTITION SCHEME lets_maintain_this_going_forward_scheme
AS PARTITION lets_maintain_this_going_forward ALL TO ([PartitionedData]);

-- Table for New Data going forward with new ID datatype of BIGINT
CREATE TABLE dbo.TestPartitioned_BIGINT
(
    ID BIGINT IDENTITY(500000,1) PRIMARY KEY CLUSTERED,
    VAL CHAR(1)
) ON lets_maintain_this_going_forward_scheme (ID);

-- Add check constraint to be used by Partitioned View
ALTER TABLE dbo.TestPartitioned_BIGINT ADD CONSTRAINT CK_ID_BIGINT CHECK (ID > CAST(499999 AS BIGINT));
Run Code Online (Sandbox Code Playgroud)

A few notes here:

Table Partitioning on the New Table

This new table is also going to be partitioned based on your requirements. It's a good idea to think about future maintenance needs, so create some new file groups, define a better partition alignment strategy, etc. My example is keeping it simple, so I'm throwing all of my partitions into one Filegroup. Don't do that in production, instead follow Table Partitioning Best Practices, courtesy of Brent Ozar et. al.

Check Constraint

Because I want to take advantage of Partitioned Views, I need to add a CHECK CONSTRAINT to this new table. I know that my insert statement generated ~440k records, so to be safe, I'm going to start my IDENTITY seed at 500k and create a CONSTRAINT defining this as well. The constraint will be used by the optimizer when evaluating which tables can be eliminated when the eventual Partitioned View is called.

Now to Mash the Two Tables Together

Partitioned Views don't particularly do well when you throw mixed datatypes at them when it comes to the partition column in the underlying tables. To get around this, we have to persist the current ID column in your current table as a BIGINT value. We're going to do that by adding a Persisted Computed Column, as follows:

-- Append Calculated Columns of New Datatype to Old Table
-- WARNING: This will take a while against a large data set and will likely lock the table
ALTER TABLE dbo.TestPartitioned ADD ID_BIG AS (CAST(ID AS BIGINT)) PERSISTED
GO

-- Add Constraints on Calculated Column
-- WARNING: This will likely lock the table
ALTER TABLE dbo.TestPartitioned ADD CONSTRAINT CK_ID_TestPartitioned_BIGINT CHECK(ID_BIG <= CAST(499999 AS BIGINT));
GO

-- Create a new Nonclustered index on ID_BIG to act as new "pkey"
CREATE NONCLUSTERED INDEX IX_TestPartitioned__BIG_ID__VAL ON dbo.TestPartitioned (ID_BIG) INCLUDE (VAL);
Run Code Online (Sandbox Code Playgroud)

I've also added another CHECK CONSTRAINT on the old table (to aid with Partition Elimination from our eventual Partitioned View) and a new Non-Clustered Index that is going to act like a primary key index (because look ups originating from the Partitioned View are going to be occurring against ID_BIG instead of ID).

With the new Computed Column and Check Constraint in place, I can finally define the Partitioned View, as follows:

-- Build a Partitioned View on Top of the old and new tables
CREATE VIEW dbo.vw_TableAll
WITH SCHEMABINDING
AS
SELECT ID_BIG AS ID, VAL FROM dbo.TestPartitioned
UNION ALL
SELECT ID, VAL FROM dbo.TestPartitioned_BIGINT
GO
Run Code Online (Sandbox Code Playgroud)

Running a quick query against this view will confirm we have partition elimination working (as you don't see any lookups occurring against the new table we've created):

SELECT *
FROM dbo.vw_TableAll
WHERE ID < CAST(500 AS BIGINT);
Run Code Online (Sandbox Code Playgroud)

Execution Plan:

enter image description here

At this stage, you'll need to make a few changes to your application:

  1. Stop inserting new records into the old table, and instead insert them into the new table. We're not allowed to insert records into the Partitioned View because we're using IDENTITY values in the underlying tables. Partitioned Views don't allow for this, so you have to insert records directly into the new table in this scenario.
  2. Adjust any SELECT queries (that point to the old table) to point to the Partitioned View. Alternatively, you can rename the old table (e.g. TableName_Old) and create the view as the old tables' name (e.g. TableName); how fancy you want to get here is up to you.

New Records to the New Table

At this point, new records should be getting inserted into the new table. I'll simulate this by running the following:

--Populate Table with Data
INSERT INTO dbo.TestPartitioned_BIGINT WITH (TABLOCK) (VAL)
SELECT CHAR(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 128)
FROM master.sys.configurations t1
    CROSS JOIN master.sys.configurations t2
    CROSS JOIN master.sys.configurations t3;
GO
Run Code Online (Sandbox Code Playgroud)

Again, my Identity Seed is configured so I won't have any ID conflicts between the two tables. The CHECK CONSTRAINTS on both tables should also enforce this. Let's confirm that Partition Elimination is still occurring:

SELECT *
FROM dbo.vw_TableAll
WHERE ID > CAST(300000 AS BIGINT)
    AND ID < CAST(300500 AS BIGINT);

SELECT *
FROM dbo.vw_TableAll
WHERE ID > CAST(500000 AS BIGINT)
    AND ID < CAST(500500 AS BIGINT);
Run Code Online (Sandbox Code Playgroud)

Execution Plans:

enter image description here

Do take note that most queries will likely span both tables. While we won't be able to take advantage of partition elimination in these scenarios, the query plans should remain as optimal as possible with the benefit that you don't have to rewrite your underlying queries (if you decided to name the view the same as your old table).

What to do now?

Well this is wholly dependent upon you. If the current records are never going to disappear and you're happy with performance from the Partitioned View, pour a celebratory beer because you're done. Huzzah!

If you want to consolidate the old table into the new table, you're going to have to pick maintenance windows within which to do the next series of operations. Basically, you're going to drop the constraints on the tables (which will break the partition elimination component of the Partitioned View), copy your most recent records over from the old table to the new table, purge these records from the old table, and then update constraints (so the Partitioned View is able to take advantage of partition elimination once again). Because of the volume of data you have in the existing tables, you may have to go through a few rounds of this process to consolidate everything. Those steps are summarized as follows:

-- During a maintenance window, transfer old records to new table if you so choose
-- Drop Check Constraint while transferring over records
ALTER TABLE dbo.TestPartitioned_BIGINT DROP CONSTRAINT CK_ID_BIGINT;

-- Retain Identity Values
SET IDENTITY_INSERT dbo.TestPartitioned_BIGINT ON

-- Copy records into the new table
INSERT INTO dbo.TestPartitioned_BIGINT (ID, VAL)
SELECT ID_BIG, VAL
FROM dbo.TestPartitioned
WHERE ID > 300000

SET IDENTITY_INSERT dbo.TestPartitioned_BIGINT OFF

-- Enable Check Constraint after transferred records are complete, ensuring the Check Constraint is defined on the proper ID value
ALTER TABLE dbo.TestPartitioned_BIGINT ADD CONSTRAINT CK_ID_BIGINT CHECK (ID > CAST(300000 AS BIGINT));
GO

-- Purge records from original table
DELETE
FROM dbo.TestPartitioned
WHERE ID > 300000

-- Recreate Check Constraint on original table, ensuring the Check Constraint is defined on the proper ID value
ALTER TABLE dbo.TestPartitioned DROP CONSTRAINT CK_ID_TestPartitioned_BIGINT;
GO
ALTER TABLE dbo.TestPartitioned ADD CONSTRAINT CK_ID_TestPartitioned_BIGINT CHECK(ID_BIG <= CAST(300000 AS BIGINT));
GO
Run Code Online (Sandbox Code Playgroud)

If possible, test this out in a non production environment. I don't condone testing in production. Hopefully this answer helps, and if you have any questions, feel free to post a comment and I'll do my best to get back to you quickly.