Linq以sql并行进行提交更改

JJ_*_*son 3 c# linq-to-sql

给定一个更新对象的实体List,可以安全地在Parallel.For或foreach循环中每次迭代实例化一个新上下文,并在每次(比方说)10 000次迭代中调用SubmitChanges()吗?

以这种方式执行批量更新是否安全?有什么可能的缺点?

Luk*_*der 7

这可能是应该避免并行性的场景.每次迭代实例化一个新的DataContext意味着在迭代中,将从连接池中获取连接,打开并在将连接返回到池之前将单个实体写入数据库.这样做每次迭代都是一个相对昂贵的操作,因此产生的开销超过了并行性的优势.将实体添加到数据上下文并将它们作为单个操作写入数据库的情况更为有效.

使用以下作为并行插入的基准:

private static TimeSpan RunInParallel(int inserts)
{
    Stopwatch watch = new Stopwatch();
    watch.Start();

    Parallel.For(0, inserts, new ParallelOptions() { MaxDegreeOfParallelism = 100 },
        (i) =>
        {
            using (var context = new DataClasses1DataContext())
            {
                context.Tables.InsertOnSubmit(new Table() { Number = i });
                context.SubmitChanges();
            }
        }
    );

    watch.Stop();
    return watch.Elapsed;
}
Run Code Online (Sandbox Code Playgroud)

对于连续插入:

private static TimeSpan RunInSerial(int inserts)
{
    Stopwatch watch = new Stopwatch();
    watch.Start();
    using (var ctx = new DataClasses1DataContext())
    {
        for (int i = 0; i < inserts; i++)
        {
            ctx.Tables.InsertOnSubmit(new Table() { Number = i });
        }
        ctx.SubmitChanges();
    }
    watch.Stop();
    return watch.Elapsed;
}
Run Code Online (Sandbox Code Playgroud)

这些DataClasses1DataContext类是自动生成DataContext的:

表类

在第一代Intel i7(8个逻辑核心)上运行时,获得了以下结果:

10 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:00.0202820
Average time elapsed for a 100 runs in serial:   00:00:00.0108694

100 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:00.2269799
Average time elapsed for a 100 runs in serial:   00:00:00.1434693

1000 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:02.1647577
Average time elapsed for a 100 runs in serial:   00:00:00.8163786

10000 inserts:
Average time elapsed for a 10 runs in parallel:  00:00:22.7436584
Average time elapsed for a 10 runs in serial:    00:00:07.7273398
Run Code Online (Sandbox Code Playgroud)

通常,当并行运行时,插入所花费的时间大约是没有并行运行时的两倍.

更新: 如果您可以为数据实现一些批处理方案,则使用并行插入可能是有益的.

使用批次时,批次的大小会影响插入性能,因此必须确定每批次的条目数和插入的批次数之间的最佳比率.为了证明这一点,使用以下方法将10000个插入批量分组为1个(10000个批次,与初始并行方法相同),10个(1000个批次),100个(100个批次),1000个(10个批次),10000个(1个批次) ,与串行插入方法相同)然后并行插入每个批次:

private static TimeSpan RunAsParallelBatches(int inserts, int batchSize)
{
    Stopwatch watch = new Stopwatch();
    watch.Start();

    // batch the data to be inserted 
    List<List<int>> batches = new List<List<int>>();
    for (int g = 0; g < inserts / batchSize; g++)
    {
        List<int> numbers = new List<int>();
        int start = g * batchSize;
        int end = start + batchSize;
        for (int i = start; i < end; i++)
        {
            numbers.Add(i);
        }
        batches.Add(numbers);
    }

    // insert each batch in parallel
    Parallel.ForEach(batches,
        (batch) =>
        {
            using (DataClasses1DataContext ctx = new DataClasses1DataContext())
            {
                foreach (int number in batch)
                {
                    ctx.Tables.InsertOnSubmit(new Table() { Number = number });
                }
                ctx.SubmitChanges();
            }
        }
    );

    watch.Stop();
    return watch.Elapsed;
}
Run Code Online (Sandbox Code Playgroud)

取10次10​​000次插入的平均时间产生以下结果:

10000 inserts repeated 10 times
Average time for initial parallel insertion approach:                 00:00:22.7436584
Average time in parallel using batches of 1 entity (10000 batches):   00:00:23.1088289
Average time in parallel using batches of 10 entities (1000 batches): 00:00:07.1443220
Average time in parallel using batches of 100 entities (100 batches): 00:00:04.3111268
Average time in parallel using batches of 1000 entities (10 batches): 00:00:04.0668334
Average time in parallel using batches of 10000 entities (1 batch):   00:00:08.2820498
Average time for serial insertion approach:                           00:00:07.7273398
Run Code Online (Sandbox Code Playgroud)

因此,通过将插入分组到组中,只要在迭代中执行足够的工作以超过设置DataContext和执行批量插入的开销,就可以获得性能提升.在这种情况下,通过将插入分组成1000个组,并行插入设法在该系统上执行串行~2倍.