Joh*_*ski 73 import performance loops entity-framework savechanges
我正在运行一个导入,每次运行将有1000个记录.只是在我的假设上寻找一些确认:
以下哪一项最有意义:
SaveChanges()每个AddToClassName()电话.SaveChanges()每n次AddToClassName()呼叫运行一次.SaveChanges()后,所有的的AddToClassName()电话.第一种选择可能是慢的吗?因为它需要分析内存中的EF对象,生成SQL等.
我假设第二个选项是两个世界中最好的,因为我们可以围绕该SaveChanges()调用包装try catch ,并且如果其中一个失败,则一次只丢失n个记录.也许将每个批次存储在List <>中.如果SaveChanges()调用成功,请删除列表.如果失败,请记录项目.
最后一个选项可能最终也会非常慢,因为每个EF对象都必须在内存中才能SaveChanges()被调用.如果保存失败,则不会发生任何事情,对吧?
Luk*_*Led 56
我会首先测试它以确定.性能不一定非常糟糕.
如果需要在一个事务中输入所有行,请在所有AddToClassName类之后调用它.如果可以单独输入行,请在每行之后保存更改.数据库一致性很重要.
第二种选择我不喜欢.如果我对系统进行导入并且它会从1000中减少10行,那将会让我感到困惑(从最终用户的角度来看),因为1是坏的.您可以尝试导入10,如果失败,请逐个尝试然后登录.
测试是否需要很长时间.不要写'可行'.你还不知道.只有当它实际上是一个问题时,请考虑其他解决方案(marc_s).
编辑
我做了一些测试(时间以毫秒为单位):
10000行:
1行后的SaveChanges():18510,534
100行后的
SaveChanges():4350,3075 10000行后的SaveChanges():5233,0635
50000行:
1行后的SaveChanges():78496,929
500行后的
SaveChanges():22302,2835 50000行后的SaveChanges():24022,8765
因此,实际上在n行之后提交比在毕竟更快.
我的建议是:
测试类:
表:
CREATE TABLE [dbo].[TestTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NOT NULL,
[SomeVarchar] [varchar](100) NOT NULL,
[SomeOtherVarchar] [varchar](50) NOT NULL,
[SomeOtherInt] [int] NULL,
CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)
类:
public class TestController : Controller
{
//
// GET: /Test/
private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private string RandomString(int size)
{
var randomSize = _rng.Next(size);
char[] buffer = new char[randomSize];
for (int i = 0; i < randomSize; i++)
{
buffer[i] = _chars[_rng.Next(_chars.Length)];
}
return new string(buffer);
}
public ActionResult EFPerformance()
{
string result = "";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
TruncateTable();
return Content(result);
}
private void TruncateTable()
{
using (var context = new CamelTrapEntities())
{
var connection = ((EntityConnection)context.Connection).StoreConnection;
connection.Open();
var command = connection.CreateCommand();
command.CommandText = @"TRUNCATE TABLE TestTable";
command.ExecuteNonQuery();
}
}
private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
{
var startDate = DateTime.Now;
using (var context = new CamelTrapEntities())
{
for (int i = 1; i <= noOfRows; ++i)
{
var testItem = new TestTable();
testItem.SomeVarchar = RandomString(100);
testItem.SomeOtherVarchar = RandomString(50);
testItem.SomeInt = _rng.Next(10000);
testItem.SomeOtherInt = _rng.Next(200000);
context.AddToTestTable(testItem);
if (i % commitAfterRows == 0) context.SaveChanges();
}
}
var endDate = DateTime.Now;
return endDate.Subtract(startDate);
}
}
Run Code Online (Sandbox Code Playgroud)
Eri*_* J. 17
我只是在我自己的代码中优化了一个非常类似的问题,并想指出一个对我有用的优化.
I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.
By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.
我发现创建1个生成器线程和(CPU核心数)-1个消费者线程允许我调整每个批次提交的记录数,使得BlockingCollection中的项目数在0和1之间波动(在消费者线程占用一个之后)项目).这样,消费线程就有足够的工作来最佳地工作.
这个场景当然需要为每个批处理创建一个新的上下文,我发现即使在我的用例的单线程场景中也会更快.
mar*_*c_s 12
如果你需要导入数千条记录,我会使用像SqlBulkCopy这样的东西,而不是实体框架.