在 varchar 列中存储 NULL 与存储 ''

MrV*_*mes 7 sql-server-2005 sql-server

我意识到这可能被标记为重复,但我特别询问 SQL Server 2005

我在互联网上阅读了相互矛盾的建议,所以我在这里问。特别是在 SQL Server 2005 中,varchar 列中的 NULL 是否与空字符串占用相同的空间?

我在另一个驱动器上构建了一个“持有”表,并用源表中的数据填充它,并且在字段为nullif([field],'')空的地方,我曾经插入空值来代替空值。

然后我构建了一个与保存表结构完全相同的新表,但我没有用 null 替换空白,而是插入了空白,到目前为止它似乎占用了更多空间(我还没有完成填充它和我无法确定它是否正在占用更多数据)

所以在我进一步填充它并最终得到一个比我想象的更大的表格之前,我最好插入空值还是空白?

编辑:

将数据从持有表迁移到新表后,新表大约大了 4gb。

表大小差异

表设计中只有两个小的差异 - 'serial_number' 字段在保持表中是 char(15) 而在目标表中是 varchar(15)。(序列号的最大长度是 14 并且有很多空值——如果我记得的话,我想大约是 3000 万),并且持有表的聚集索引有一个额外的列 - program_name..

拿着桌

USE [Temp_holding_EWS]
GO
/****** Object:  Table [dbo].[AmtoteAccountActivity_holding]    
 Script Date: 02/17/2017 20:41:32 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AmtoteAccountActivity_holding](
    [_Date] [char](8) NULL,[Community] [varchar](10) NULL,
    [AccountNumber] [varchar](50) NULL,
    [Branch] [varchar](10) NULL,
    [Window] [varchar](3) NULL,
    [Time] [char](8) NULL,[Balance_Forward] [varchar](10) NULL,
    [Transaction_Type] [varchar](10) NULL,
    [Program_Name] [varchar](10) NULL,
    [Race] [varchar](10) NULL,[Pool_Type] [varchar](10) NULL,
    [Amount] [money] NULL,[Runners] [varchar](60) NULL,
    [Total_Bet_Amount] [varchar](10) NULL,
    [Debit_Amount] [varchar](10) NULL,
    [Credit_Amount] [varchar](10) NULL,
    [Tx_Date] [char](8) NULL,
    [Check_Clear_Date] [varchar](10) NULL,
    [Refund_Amt] [varchar](10) NULL,
    [Bet_Pool_Modifier] [varchar](5) NULL,
    [RecordID] [int] IDENTITY(1,1) NOT NULL,
    [serial_number] [char](15) NULL,
    [handle]  AS 
       (CONVERT([money],[total_bet_amount],(0))-CONVERT([money],[refund_amt],(0))),
    [txdatetime]  AS (CONVERT([datetime],([tx_date]+' ')+[time],(11))),
    [dbdate]  AS (CONVERT([datetime],[_date],(11))),
    [Audit_Trail] [varchar](20) NULL,
 CONSTRAINT [PK_AmtoteAccountActivity_holding] PRIMARY KEY NONCLUSTERED 
(
    [RecordID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
Run Code Online (Sandbox Code Playgroud)

(聚集索引)

USE [Temp_holding_EWS]
GO
/****** Object:  Index [IX_AmtoteAccountActivity_holding] 
    Script Date: 02/17/2017 21:08:44 ******/
CREATE CLUSTERED INDEX [IX_AmtoteAccountActivity_holding] ON 
    [dbo].[AmtoteAccountActivity_holding] 
(
    [AccountNumber] ASC,
    [_Date] ASC,
    [Program_Name] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
    SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

目的地表

USE [EWS]
GO
/****** Object:  Table [dbo].[AmtoteAccountActivity]    
Script Date: 02/17/2017 20:48:16 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AmtoteAccountActivity](
    [_Date] [char](8) NULL,     [Community] [varchar](10) NULL,
    [AccountNumber] [varchar](50) NULL,
    [Branch] [varchar](10) NULL,[Window] [varchar](3) NULL,
    [Time] [char](8) NULL,  [Balance_Forward] [varchar](10) NULL,
    [Transaction_Type] [varchar](10) NULL,
    [Program_Name] [varchar](10) NULL,
    [Race] [varchar](10) NULL,
    [Pool_Type] [varchar](10) NULL,
    [Amount] [money] NULL,[Runners] [varchar](60) NULL,
    [Total_Bet_Amount] [varchar](10) NULL,
    [Debit_Amount] [varchar](10) NULL,
    [Credit_Amount] [varchar](10) NULL,
    [Tx_Date] [char](8) NULL,
    [Check_Clear_Date] [varchar](10) NULL,
    [Refund_Amt] [varchar](10) NULL,
    [Bet_Pool_Modifier] [varchar](5) NULL,
    [RecordID] [int] IDENTITY(1,1) NOT NULL,
    [serial_number] [varchar](15) NULL,
    [handle]  AS 
       (CONVERT([money],[total_bet_amount],(0))-CONVERT([money],[refund_amt],(0))),
    [txdatetime]  AS (CONVERT([datetime],([tx_date]+' ')+[time],(11))),
    [dbdate]  AS (CONVERT([datetime],[_date],(11))),
    [Audit_Trail] [varchar](20) NULL,
 CONSTRAINT [PK_AmtoteAccountActivity2] PRIMARY KEY NONCLUSTERED 
(
    [RecordID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
Run Code Online (Sandbox Code Playgroud)

(聚集索引)

USE [EWS]
GO
/****** Object:  Index [IX_AmtoteAccountActivity2]  Script Date: 02/17/2017 21:06:29 ******/
CREATE CLUSTERED INDEX [IX_AmtoteAccountActivity2] ON [dbo].[AmtoteAccountActivity] 
(
    [AccountNumber] ASC,
    [_Date] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, 
ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

注意:对于任何想知道为什么它在字符字段中存储明显的财务和数值的人:这是 17 年前的原始表设计(不是我),现在有数百个 sql 查询在这个数据库上运行,它更少努力将它们保持为 varchar 并且查询保持它们的转换,而不是将它们更改为货币、整数或十进制并更改数百个查询)

Aar*_*and 9

让我们创建三个带有 varchar 列的表,其中两个允许 NULL,一个不允许。

CREATE TABLE dbo.x1(id int IDENTITY(1,1) PRIMARY KEY, field varchar(5) null);
CREATE TABLE dbo.x2(id int IDENTITY(1,1) PRIMARY KEY, field varchar(5) null);
CREATE TABLE dbo.x3(id int IDENTITY(1,1) PRIMARY KEY, field varchar(5) not null);
Run Code Online (Sandbox Code Playgroud)

用 1,000,000 行填充它们:

;WITH x(x) AS (SELECT 0 UNION ALL SELECT x+1 FROM x WHERE x < 1000000)
INSERT dbo.x1(field) SELECT NULL FROM x OPTION (MAXRECURSION 0);
INSERT dbo.x2(field) SELECT '' FROM dbo.x1;
INSERT dbo.x3(field) SELECT '' FROM dbo.x1;
Run Code Online (Sandbox Code Playgroud)

让我们检查一下尺寸:

SELECT COUNT(*)*8192/1024. FROM sys.dm_db_database_page_allocations(DB_ID(), 
  OBJECT_ID(N'dbo.x1'), 1, NULL, 'DETAILED');
SELECT COUNT(*)*8192/1024. FROM sys.dm_db_database_page_allocations(DB_ID(), 
  OBJECT_ID(N'dbo.x2'), 1, NULL, 'DETAILED');
SELECT COUNT(*)*8192/1024. FROM sys.dm_db_database_page_allocations(DB_ID(), 
  OBJECT_ID(N'dbo.x3'), 1, NULL, 'DETAILED');
Run Code Online (Sandbox Code Playgroud)

结果:

12,928 KB
12,936 KB
12,936 KB
Run Code Online (Sandbox Code Playgroud)

所以看起来对于 1,000,000 行,选择NULLover''节省了高达 8 KB(这甚至没有反映在 中sp_spaceused,因为您保存的那一页仍然是保留的,只是没有分配)。

对堆重复(同样,由于我们正在猜测您的实际表结构,因此必须进行多次测试):

12,872 KB
12,872 KB
12,928 KB
Run Code Online (Sandbox Code Playgroud)

因此,正如我所建议的,可以忽略不计,即使推断超过 120,000,000 行,最大可能的差异(再次,取决于您的架构)在适当的表上为 960KB,在堆上为 6.7MB。如果您的服务器的磁盘空间如此紧张,以至于 6.7MB 将用于驱动决策,您可能会考虑与您花费在调查此问题上的时间相比,额外的磁盘会花费多少。

恕我直言,在决定使用 NULL 或不表示“无数据”之间有更重要的原因。一个有很多意见和评论的好问题在这里: