创建和填充数字表的最佳方法是什么?

KM.*_*KM. 62 sql-server sql-server-2005

我已经看到了许多不同的方法来创建和填充数字表.但是,创建和填充一个的最佳方法是什么?从最重要到最不重要的"最佳"被定义:

  • 使用最佳索引创建的表
  • 行生成最快
  • 用于创建和填充的简单代码

如果你不知道数字表是什么,请看这里:我为什么要考虑使用辅助数字表?

KM.*_*KM. 124

以下是从网络和此问题的答案中获取的一些代码示例.

对于每个方法,我修改了原始代码,因此每个使用相同的表和列:NumbersTest和Number,10,000行或尽可能接近.另外,我提供了到原产地的链接.

方法1这里是一个非常慢的循环方法,从这里
平均13.01秒
运行3次删除最高,这里是秒的时间:12.42,13.60

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest(Number INT IDENTITY(1,1)) 
SET NOCOUNT ON
WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000
BEGIN 
    INSERT dbo.NumbersTest DEFAULT VALUES 
END
SET NOCOUNT OFF
-- Add a primary key/clustered index to the numbers table
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest
Run Code Online (Sandbox Code Playgroud)

方法2这里是一个更快的循环从这里
平均1.1658秒
运行11次删除最高,这里是秒的时间:1.117,1.140,1.203,1.170,1.173,1.156,1.203,1.153,1.173,1.170

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number INT NOT NULL);
DECLARE @i INT;
SELECT @i = 1;
SET NOCOUNT ON
WHILE @i <= 10000
BEGIN
    INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
    SELECT @i = @i + 1;
END;
SET NOCOUNT OFF
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest
Run Code Online (Sandbox Code Playgroud)

方法3这是基于从一个单一的代码插入这里
平均488.6毫秒
跑以毫秒为单位取出11次最高的,这里有次:686,673,623,686,343,343,376,360,343,453

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<10000
)
insert into NumbersTest(Number)
    select Number from Nums option(maxrecursion 10000)
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
Run Code Online (Sandbox Code Playgroud)

这里的方法4是一个"半循环"方法,从这里 平均348.3毫秒(由于代码中间的"GO"很难获得良好的时序,任何建议都会被赞赏)
运行11次删除最高,这里是以毫秒为单位的时间:356,360,283,346,360,376,326,373,330,373

DROP TABLE NumbersTest
DROP TABLE #RunDate
CREATE TABLE #RunDate (RunDate datetime)
INSERT INTO #RunDate VALUES(GETDATE())
CREATE TABLE NumbersTest (Number int NOT NULL);
INSERT NumbersTest values (1);
GO --required
INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest
GO 14 --will create 16384 total rows
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate
SELECT COUNT(*) FROM NumbersTest
Run Code Online (Sandbox Code Playgroud)

这里的方法5是单个INSERT,来自Philip Kelley的答案
平均92.7毫秒
运行11次删除最高,这里是以毫秒为单位的时间:80,96,96,93,110,110,80,76,93,93

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  --I removed Pass5, since I'm only populating the Numbers table to 10,000
  Tally as (select row_number() over(order by C) as Number from Pass4)
INSERT NumbersTest
        (Number)
    SELECT Number
        FROM Tally
        WHERE Number <= 10000
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
Run Code Online (Sandbox Code Playgroud)

这里的方法6是来自Mladen Prajdic的单个INSERT 答案
平均82.3毫秒
运行11次删除最高,这里是以毫秒为单位的时间:80,80,93,76,93,63,93,76,93,76

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
INSERT INTO NumbersTest(Number)
SELECT TOP 10000 row_number() over(order by t1.number) as N
FROM master..spt_values t1 
    CROSS JOIN master..spt_values t2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number);
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
Run Code Online (Sandbox Code Playgroud)

这里的方法7是单个INSERT,基于此处的代码
avg 56.3毫秒
运行11次删除最高,这里是以毫秒为单位的时间:63,50,63,46,60,63,63,46,63,46

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO NumbersTest
    FROM sys.objects s1       --use sys.columns if you don't get enough rows returned to generate all the numbers you need
    CROSS JOIN sys.objects s2 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
Run Code Online (Sandbox Code Playgroud)

看完所有这些方法之后,我真的很喜欢方法7,这是最快的,代码也很简单.

  • 虽然有趣,但时间对我来说似乎并不重要.特别是因为如果我需要一个数字表,我将创建它一次并一遍又一遍地使用它. (13认同)

Mla*_*dic 52

我用这个很快就像地狱一样:

insert into Numbers(N)
select top 1000000 row_number() over(order by t1.number) as N
from   master..spt_values t1 
       cross join master..spt_values t2
Run Code Online (Sandbox Code Playgroud)

  • 请注意,Azure SQL 数据库不支持此功能。 (2认同)

Bac*_*its 19

如果您只是在SQL Server Management Studio或sqlcmd中执行此操作,则可以使用批处理分隔符允许您重复批处理的事实:

CREATE TABLE Number (N INT IDENTITY(1,1) PRIMARY KEY NOT NULL);
GO

INSERT INTO Number DEFAULT VALUES;
GO 100000
Run Code Online (Sandbox Code Playgroud)

这将在Numbers表中插入100000条记录.

这很慢.它与@ KM.答案中的方法1进行了比较,这是最慢的例子.然而,它就像代码光一样.您可以通过在插入批处理后添加主键约束来加快速度.

  • 不知道你甚至可以重复那样的批次! (2认同)

Phi*_*ley 12

我从以下模板开始,该模板源自Itzik Ben-Gan的例程:

;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
 select Number from Tally where Number <= 1000000
Run Code Online (Sandbox Code Playgroud)

"WHERE N <= 1000000"子句将输出限制为1到100万,并且可以轻松调整到所需范围.

由于这是一个WITH子句,因此可以将其设置为INSERT ... SELECT ...就像这样:

--  Sample use: create one million rows
CREATE TABLE dbo.Example (ExampleId  int  not null)  

DECLARE @RowsToCreate int
SET @RowsToCreate = 1000000

--  "Table of numbers" data generator, as per Itzik Ben-Gan (from multiple sources)
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT Example (ExampleId)
 select Number
  from Tally
  where Number <= @RowsToCreate
Run Code Online (Sandbox Code Playgroud)

在构建表之后对表进行索引将是索引它的最快方法.

哦,我把它称为"Tally"表.我认为这是一个常用术语,你可以通过谷歌搜索找到大量的技巧和例子.


Den*_*her 5

对于任何正在寻找 Azure 解决方案的人

SET NOCOUNT ON    
CREATE TABLE Numbers (n bigint PRIMARY KEY)    
GO    
DECLARE @numbers table(number int);  
WITH numbers(number) as  (   
SELECT 1 AS number   
UNION all   
SELECT number+1 FROM numbers WHERE number<10000  
)  
INSERT INTO @numbers(number)  
SELECT number FROM numbers OPTION(maxrecursion 10000)
INSERT INTO Numbers(n)  SELECT number FROM @numbers
Run Code Online (Sandbox Code Playgroud)

源自sql azure团队博客 http://azure.microsoft.com/blog/2010/09/16/create-a-numbers-table-in-sql-azure/


ili*_*ode 5

这是我利用SQL Server 2008 中引入的表值构造函数提出的一个简短而快速的内存中解决方案:

它将返回 1,000,000 行,但是您可以添加/删除 CROSS JOIN,或使用 TOP 子句来修改它。

;WITH v AS (SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(z))

SELECT N FROM (SELECT ROW_NUMBER() OVER (ORDER BY v1.z)-1 N FROM v v1 
    CROSS JOIN v v2 CROSS JOIN v v3 CROSS JOIN v v4 CROSS JOIN v v5 CROSS JOIN v v6) Nums
Run Code Online (Sandbox Code Playgroud)

请注意,这可以在运行中快速计算,或者(甚至更好)存储在永久表中(只需INTOSELECT N段后添加一个子句),并在N字段上使用主键以提高效率。