批量插入关于它的最佳方式?+帮助我完全理解到目前为止我发现的内容

cho*_*bo2 7 .net c# sql-server performance linq-to-sql

所以我在这里看到这篇文章并阅读它,似乎批量复制可能是要走的路.

从c#批量数据库插入的最佳方法是什么?

我仍然有一些问题,想知道事情是如何运作的.

所以我找到了2个教程.

http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241

http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx

第一种方式使用2个ado.net 2.0功能.BulkInsert和BulkCopy.第二个使用linq到sql和OpenXML.

这种吸引我,因为我已经使用linq到sql并且更喜欢它而不是ado.net.然而,正如一个人在帖子中指出他只是以牺牲表现为代价来解决这个问题(在我看来,这没有错)

首先,我将讨论第一个教程中的两种方法

我正在使用VS2010 Express(用于测试我使用VS2008的教程,不知道我刚刚加载了什么.net版本的示例文件并运行它们),.net 4.0,MVC 2.0,SQl Server 2005

  1. ado.net 2.0是最新版本吗?
  2. 基于我正在使用的技术,是否会对我将要展示的内容进行一些更新,以某种方式改进它?
  3. 这些教程遗漏了我应该知道的事情吗?

BulkInsert

我将这个表用于所有示例.

CREATE TABLE [dbo].[TBL_TEST_TEST]
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    [NAME] [varchar](50) 
)
Run Code Online (Sandbox Code Playgroud)

SP代码

USE [Test]
GO
/****** Object:  StoredProcedure [dbo].[sp_BatchInsert]    Script Date: 05/19/2010 15:12:47 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sp_BatchInsert] (@Name VARCHAR(50) )
AS
BEGIN
            INSERT INTO TBL_TEST_TEST VALUES (@Name);
END 
Run Code Online (Sandbox Code Playgroud)

C#代码

/// <summary>
/// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert.
/// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go.
/// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
/// </summary>
private static void BatchInsert()
{
    // Get the DataTable with Rows State as RowState.Added
    DataTable dtInsertRows = GetDataTable();

    SqlConnection connection = new SqlConnection(connectionString);
    SqlCommand command = new SqlCommand("sp_BatchInsert", connection);
    command.CommandType = CommandType.StoredProcedure;
    command.UpdatedRowSource = UpdateRowSource.None;

    // Set the Parameter with appropriate Source Column Name
    command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName);

    SqlDataAdapter adpt = new SqlDataAdapter();
    adpt.InsertCommand = command;
    // Specify the number of records to be Inserted/Updated in one go. Default is 1.
    adpt.UpdateBatchSize = 1000;

    connection.Open();
    int recordsInserted = adpt.Update(dtInsertRows);
    connection.Close();
}
Run Code Online (Sandbox Code Playgroud)

首先是批量大小.为什么要将批量大小设置为除了要发送的记录数之外的任何内容?就像我发送500,000条记录一样,我的批量大小为500,000.

接下来为什么我这样做会崩溃?如果我将它设置为1000批量大小它工作得很好.

System.Data.SqlClient.SqlException was unhandled
  Message="A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)"
  Source=".Net SqlClient Data Provider"
  ErrorCode=-2146232060
  Class=20
  LineNumber=0
  Number=233
  Server=""
  State=0
  StackTrace:
       at System.Data.Common.DbDataAdapter.UpdatedRowStatusErrors(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount)
       at System.Data.Common.DbDataAdapter.UpdatedRowStatus(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount)
       at System.Data.Common.DbDataAdapter.Update(DataRow[] dataRows, DataTableMapping tableMapping)
       at System.Data.Common.DbDataAdapter.UpdateFromDataTable(DataTable dataTable, DataTableMapping tableMapping)
       at System.Data.Common.DbDataAdapter.Update(DataTable dataTable)
       at TestIQueryable.Program.BatchInsert() in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 124
       at TestIQueryable.Program.Main(String[] args) in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 16
  InnerException: 
Run Code Online (Sandbox Code Playgroud)

插入批量大小为1000的500,000条记录花了"2分54秒"的时间

当然这不是我坐在那里停下来的官方时间(我确信有更好的方法,但是懒得看他们在哪里)

所以我发现与我所有其他的相比有点慢(期望linq到sql插入一个)并且我不确定为什么.

接下来我看了批量复制

/// <summary>
/// An ado.net 2.0 way to mass insert records. This seems to be the fastest.
/// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
/// </summary>
private static void BatchBulkCopy()
{
    // Get the DataTable 
    DataTable dtInsertRows = GetDataTable();

    using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
    {
        sbc.DestinationTableName = "TBL_TEST_TEST";

        // Number of records to be processed in one go
        sbc.BatchSize = 500000;

        // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table
        // sbc.ColumnMappings.Add("ID", "ID");
        sbc.ColumnMappings.Add("NAME", "NAME");

        // Number of records after which client has to be notified about its status
        sbc.NotifyAfter = dtInsertRows.Rows.Count;

        // Event that gets fired when NotifyAfter number of records are processed.
        sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied);

        // Finally write to server
        sbc.WriteToServer(dtInsertRows);
        sbc.Close();
    }

}
Run Code Online (Sandbox Code Playgroud)

这个似乎真的很快,甚至不需要SP(你可以使用SP与批量复制吗?如果你可以,它会更好吗?)

BatchCopy对于500,000批量大小没有问题.所以再次为什么要使它小于你要发送的记录数?

我发现使用BatchCopy和500,000批量大小只需5秒即可完成.然后我尝试批量大小为1,000,它只用了8秒钟.

比上面的bulkinsert快得多.

现在我尝试了其他教程.

USE [Test]
GO
/****** Object:  StoredProcedure [dbo].[spTEST_InsertXMLTEST_TEST]    Script Date: 05/19/2010 15:39:03 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[spTEST_InsertXMLTEST_TEST](@UpdatedProdData nText)
AS 
 DECLARE @hDoc int   

 exec sp_xml_preparedocument @hDoc OUTPUT,@UpdatedProdData 

 INSERT INTO TBL_TEST_TEST(NAME)
 SELECT XMLProdTable.NAME
    FROM OPENXML(@hDoc, 'ArrayOfTBL_TEST_TEST/TBL_TEST_TEST', 2)   
       WITH (
                ID Int,                 
                NAME varchar(100)
            ) XMLProdTable

EXEC sp_xml_removedocument @hDoc
Run Code Online (Sandbox Code Playgroud)

C#代码.

/// <summary>
/// This is using linq to sql to make the table objects. 
/// It is then serailzed to to an xml document and sent to a stored proedure
/// that then does a bulk insert(I think with OpenXML)
///  http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
/// </summary>
private static void LinqInsertXMLBatch()
{
    using (TestDataContext db = new TestDataContext())
    {
        TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000];
        for (int count = 0; count < 500000; count++)
        {
            TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
            testRecord.NAME = "Name : " + count;
            testRecords[count] = testRecord;
        }

        StringBuilder sBuilder = new StringBuilder();
        System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
        XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[]));
        serializer.Serialize(sWriter, testRecords);
        db.insertTestData(sBuilder.ToString());
    }
}
Run Code Online (Sandbox Code Playgroud)

所以我喜欢这个因为我使用对象,即使它有点多余.我不了解SP的工作原理.就像我没有得到整个事情.我不知道OPENXML是否有一些批量插入引擎盖,但我甚至不知道如何采用这个示例SP并将其更改为适合我的表格,因为就像我说我不知道​​发生了什么.

我也不知道如果对象中有更多表格会发生什么.比如说我有一个ProductName表,它与Product表或类似的东西有什么关系.

在linq to sql中,您可以获取产品名称对象并对同一对象中的Product表进行更改.所以我不确定如何考虑到这一点.我不确定我是否必须单独插入或什么.

时间非常好,500,000条记录需要52秒

当然,最后一种方法是使用linq来完成所有操作并且非常糟糕.

/// <summary>
/// This is using linq to sql to to insert lots of records. 
/// This way is slow as it uses no mass insert.
/// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records.
/// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
/// </summary>
private static void LinqInsertAll()
{
    using (TestDataContext db = new TestDataContext())
    {
        db.CommandTimeout = 600;
        for (int count = 0; count < 50000; count++)
        {
            TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
            testRecord.NAME = "Name : " + count;
            db.TBL_TEST_TESTs.InsertOnSubmit(testRecord);
        }
        db.SubmitChanges();
    }
}
Run Code Online (Sandbox Code Playgroud)

我只做了5万条记录,花了一分多钟才做.

所以我真的缩小了它对linq到sql批量插入方式或批量复制的做法.当你和他们有任何关系时,我只是不确定怎么做.我不确定他们在更新而不是插入时都会站起来,因为我还没有尝试过.

我认为我不需要在一种类型中插入/更新超过50,000条记录,但同时我知道在插入之前我必须对记录进行验证,以便减慢它的速度,这样就可以使linq变为sql nicer作为你的对象,特别是如果你在插入数据库之前首先从xml文件解析数据.

完整的C#代码

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Serialization;
using System.Data;
using System.Data.SqlClient;

namespace TestIQueryable
{
    class Program
    {
        private static string connectionString = "";
        static void Main(string[] args)
        {
            BatchInsert();
            Console.WriteLine("done");
        }

        /// <summary>
        /// This is using linq to sql to to insert lots of records. 
        /// This way is slow as it uses no mass insert.
        /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records.
        /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
        /// </summary>
        private static void LinqInsertAll()
        {
            using (TestDataContext db = new TestDataContext())
            {
                db.CommandTimeout = 600;
                for (int count = 0; count < 50000; count++)
                {
                    TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
                    testRecord.NAME = "Name : " + count;
                    db.TBL_TEST_TESTs.InsertOnSubmit(testRecord);
                }
                db.SubmitChanges();
            }
        }

        /// <summary>
        /// This is using linq to sql to make the table objects. 
        /// It is then serailzed to to an xml document and sent to a stored proedure
        /// that then does a bulk insert(I think with OpenXML)
        ///  http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
        /// </summary>
        private static void LinqInsertXMLBatch()
        {
            using (TestDataContext db = new TestDataContext())
            {
                TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000];
                for (int count = 0; count < 500000; count++)
                {
                    TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
                    testRecord.NAME = "Name : " + count;
                    testRecords[count] = testRecord;
                }

                StringBuilder sBuilder = new StringBuilder();
                System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
                XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[]));
                serializer.Serialize(sWriter, testRecords);
                db.insertTestData(sBuilder.ToString());
            }
        }

        /// <summary>
        /// An ado.net 2.0 way to mass insert records. This seems to be the fastest.
        /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
        /// </summary>
        private static void BatchBulkCopy()
        {
            // Get the DataTable 
            DataTable dtInsertRows = GetDataTable();

            using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
            {
                sbc.DestinationTableName = "TBL_TEST_TEST";

                // Number of records to be processed in one go
                sbc.BatchSize = 500000;

                // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table
                // sbc.ColumnMappings.Add("ID", "ID");
                sbc.ColumnMappings.Add("NAME", "NAME");

                // Number of records after which client has to be notified about its status
                sbc.NotifyAfter = dtInsertRows.Rows.Count;

                // Event that gets fired when NotifyAfter number of records are processed.
                sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied);

                // Finally write to server
                sbc.WriteToServer(dtInsertRows);
                sbc.Close();
            }

        }


        /// <summary>
        /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert.
        /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go.
        /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
        /// </summary>
        private static void BatchInsert()
        {
            // Get the DataTable with Rows State as RowState.Added
            DataTable dtInsertRows = GetDataTable();

            SqlConnection connection = new SqlConnection(connectionString);
            SqlCommand command = new SqlCommand("sp_BatchInsert", connection);
            command.CommandType = CommandType.StoredProcedure;
            command.UpdatedRowSource = UpdateRowSource.None;

            // Set the Parameter with appropriate Source Column Name
            command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName);

            SqlDataAdapter adpt = new SqlDataAdapter();
            adpt.InsertCommand = command;
            // Specify the number of records to be Inserted/Updated in one go. Default is 1.
            adpt.UpdateBatchSize = 500000;

            connection.Open();
            int recordsInserted = adpt.Update(dtInsertRows);
            connection.Close();
        }



        private static DataTable GetDataTable()
        {
            // You First need a DataTable and have all the insert values in it
            DataTable dtInsertRows = new DataTable();
            dtInsertRows.Columns.Add("NAME");

            for (int i = 0; i < 500000; i++)
            {
                DataRow drInsertRow = dtInsertRows.NewRow();
                string name = "Name : " + i;
                drInsertRow["NAME"] = name;
                dtInsertRows.Rows.Add(drInsertRow);


            }
            return dtInsertRows;

        }


        static void sbc_SqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
        {
            Console.WriteLine("Number of records affected : " + e.RowsCopied.ToString());
        }


    }
}
Run Code Online (Sandbox Code Playgroud)

mdm*_*dma 2

批量大小是为了减少网络延迟的影响。不需要超过几千。多个语句被收集在一起并作为一个单元发送,因此每 N 个语句(而不是每个语句一次)您会获得一次网络访问的命中。