Jas*_*son 0 sql-server parallel-processing bulkinsert sqlbulkcopy bulk-load
我有 100 多个文件要导入 sql server,其中大部分是 500 MB。我想利用 SQL Server 的并行导入实用程序并阅读了许多网页,如下所示:
如何在 30 分钟内加载 1 TB 数据
https://technet.microsoft.com/en-us/library/dd537533(v=sql.100).aspx
使用表级锁定并行导入数据
https://technet.microsoft.com/en-us/library/ms186341(v=sql.105).aspx
控制批量导入的锁定行为
https://technet.microsoft.com/en-us/library/ms180876(v=sql.105).aspx
以及 stackoverflow 中的答案
然而,他们都没有给出一个简单的代码示例。我知道如何使用批量插入/bcp,但我不知道从哪里开始并行导入?任何人都可以帮助我吗?
我的系统是Windows,我使用的是SQL server 2016。源数据文件为txt格式。
在此先感谢您的帮助!
杰森
将文件路径详细信息加载到跟踪表中
Create table FileListCollection TABLE (Id int identity(1,1), filepath VARCHAR(500), ThreadNo tinyint, isLoaded int)
DECLARE @FileListCollection TABLE (filepath VARCHAR(500))
DECLARE @folderpath NVARCHAR(500)
DECLARE @cmd NVARCHAR(100)
SET @folderpath = '<FolderPath>'
SET @cmd = 'dir ' + @folderpath + ' /b /s'
INSERT INTO @FileListCollection
EXECUTE xp_cmdshell @cmd
DELETE
FROM @FileListCollection
WHERE filepath IS NULL
insert into FileListCollection(filepath, isLoaded)
select filepath, 0
from @FileListCollection
Run Code Online (Sandbox Code Playgroud)每个线程的调度
declare @ThreadNo int = 3
update f set ThreadNo=(id%@ThreadNo)
from FileListCollection f
Run Code Online (Sandbox Code Playgroud)打开三个会话并为每个会话分配线程号
运行以下脚本以加载数据
DECLARE @filepath NVARCHAR(500)
DECLARE @filepath NVARCHAR(500)
DECLARE @bcpquery NVARCHAR(MAX);
DECLARE @ThreadNo int = 1
WHILE EXISTS (
SELECT TOP 1 *
FROM FileListCollection
where ThreadNo = @ThreadNo
and isLoaded = 0
)
BEGIN
SELECT TOP 1 @filepath = filepath
FROM FileListCollection
where ThreadNo = @ThreadNo
and isLoaded = 0
SET @bcpquery = 'bulk insert <Database>.dbo.Table from '''+ @filepath+''' with (fieldterminator = ''|'', rowterminator = ''\n'')';
print @bcpquery
--Load the Content in table
execute sp_executesql @bcpquery;
Update FileListCollection set isLoaded = 1
WHERE filepath = @filepath
END
Run Code Online (Sandbox Code Playgroud) 归档时间: |
|
查看次数: |
1364 次 |
最近记录: |