Kie*_*lin 6 c# entity-framework asp.net-core
我试图请求大量数据,然后将其解析为报告。问题是我请求的数据有 2700 万行记录,每行记录有 6 个连接,当通过实体框架加载时,会使用所有服务器 RAM。我实现了一个分页系统,将处理缓冲成更小的块,就像处理 IO 操作一样。
我请求 10,000 条记录,将它们写入文件流(写入磁盘),然后我尝试从内存中清除这 10,000 条记录,因为它们不再需要。
我在垃圾收集数据库上下文时遇到问题。我尝试过处理该对象,将引用清空,然后在下一批 10,000 条记录上创建新的上下文。这似乎不起作用。(这是 ef core 上的一位开发人员推荐的:https://github.com/aspnet/EntityFramework/issues/5473)
我看到的唯一的其他选择是使用原始 SQL 查询来实现我想要的。我正在尝试构建系统来处理任何大小的请求,唯一的可变因素是生成报告所需的时间。我可以使用 EF 上下文执行某些操作来删除已加载的实体吗?
private void ProcessReport(ZipArchive zip, int page, int pageSize)
{
using (var context = new DBContext(_contextOptions))
{
var batch = GetDataFromIndex(page, pageSize, context).ToArray();
if (!batch.Any())
{
return;
}
var file = zip.CreateEntry("file_" + page + ".csv");
using (var entryStream = file.Open())
using (var streamWriter = new StreamWriter(entryStream))
{
foreach (var reading in batch)
{
try
{
streamWriter.WriteLine("write data from record here.")
}
catch (Exception e)
{
//handle error
}
}
}
batch = null;
}
ProcessReport(zip, page + 1, pageSize);
}
private IEnumerable<Reading> GetDataFromIndex(int page, int pageSize, DBContext context)
{
var batches = (from rb in context.Reading.AsNoTracking()
//Some joins
select rb)
.Skip((page - 1) * pageSize)
.Take(pageSize);
return batches
.Includes(x => x.Something)
}
Run Code Online (Sandbox Code Playgroud)
除了内存管理问题之外,使用分页来实现此目的也会很糟糕。运行分页查询在服务器上会变得昂贵。您无需寻呼。只需迭代查询结果(即不调用 ToList() 或 ToArray())。
此外,在分页时,您必须向查询添加排序,否则 SQL 可能会返回重叠的行或存在间隙。请参阅 SQL Server,例如:https: //learn.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql EF Core 不强制执行此操作,因为某些提供程序可能保证分页查询始终以相同的顺序读取行。
以下是 EF Core(.NET Core 上的 1.1)在不增加内存使用量的情况下处理巨大结果集的示例:
using Microsoft.EntityFrameworkCore;
using System.Linq;
using System;
using System.ComponentModel.DataAnnotations.Schema;
namespace efCoreTest
{
[Table("SomeEntity")]
class SomeEntity
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public DateTime CreatedOn { get; set; }
public int A { get; set; }
public int B { get; set; }
public int C { get; set; }
public int D { get; set; }
virtual public Address Address { get; set; }
public int AddressId { get; set; }
}
[Table("Address")]
class Address
{
[DatabaseGenerated(DatabaseGeneratedOption.None)]
public int Id { get; set; }
public string Line1 { get; set; }
public string Line2 { get; set; }
public string Line3 { get; set; }
}
class Db : DbContext
{
public DbSet<SomeEntity> SomeEntities { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseSqlServer("Server=.;Database=efCoreTest;Integrated Security=true");
}
}
class Program
{
static void Main(string[] args)
{
using (var db = new Db())
{
db.Database.EnsureDeleted();
db.Database.EnsureCreated();
db.Database.ExecuteSqlCommand("alter database EfCoreTest set recovery simple;");
var LoadAddressesSql = @"
with N as
(
select top (10) cast(row_number() over (order by (select null)) as int) i
from sys.objects o, sys.columns c, sys.columns c2
)
insert into Address(Id, Line1, Line2, Line3)
select i Id, 'AddressLine1' Line1,'AddressLine2' Line2,'AddressLine3' Line3
from N;
";
var LoadEntitySql = @"
with N as
(
select top (1000000) cast(row_number() over (order by (select null)) as int) i
from sys.objects o, sys.columns c, sys.columns c2
)
insert into SomeEntity (Name, Description, CreatedOn, A,B,C,D, AddressId)
select concat('EntityName',i) Name,
concat('Entity Description which is really rather long for Entity whose ID happens to be ',i) Description,
getdate() CreatedOn,
i A, i B, i C, i D, 1+i%10 AddressId
from N
";
Console.WriteLine("Generating Data ...");
db.Database.ExecuteSqlCommand(LoadAddressesSql);
Console.WriteLine("Loaded Addresses");
for (int i = 0; i < 10; i++)
{
var rows = db.Database.ExecuteSqlCommand(LoadEntitySql);
Console.WriteLine($"Loaded Entity Batch {rows} rows");
}
Console.WriteLine("Finished Generating Data");
var results = db.SomeEntities.AsNoTracking().Include(e => e.Address).AsEnumerable();
int batchSize = 10 * 1000;
int ix = 0;
foreach (var r in results)
{
ix++;
if (ix % batchSize == 0)
{
Console.WriteLine($"Read Entity {ix} with name {r.Name}. Current Memory: {GC.GetTotalMemory(false) / 1024}kb GC's Gen0:{GC.CollectionCount(0)} Gen1:{GC.CollectionCount(1)} Gen2:{GC.CollectionCount(2)}");
}
}
Console.WriteLine($"Done. Current Memory: {GC.GetTotalMemory(false)/1024}kb");
Console.ReadKey();
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
输出
Generating Data ...
Loaded Addresses
Loaded Entity Batch 1000000 rows
Loaded Entity Batch 1000000 rows
. . .
Loaded Entity Batch 1000000 rows
Finished Generating Data
Read Entity 10000 with name EntityName10000. Current Memory: 2854kb GC's Gen0:7 Gen1:1 Gen2:0
Read Entity 20000 with name EntityName20000. Current Memory: 4158kb GC's Gen0:14 Gen1:1 Gen2:0
Read Entity 30000 with name EntityName30000. Current Memory: 2446kb GC's Gen0:22 Gen1:1 Gen2:0
. . .
Read Entity 9990000 with name EntityName990000. Current Memory: 2595kb GC's Gen0:7429 Gen1:9 Gen2:1
Read Entity 10000000 with name EntityName1000000. Current Memory: 3908kb GC's Gen0:7436 Gen1:9 Gen2:1
Done. Current Memory: 3916kb
Run Code Online (Sandbox Code Playgroud)
请注意,EF Core 内存消耗过多的另一个常见原因是查询的“混合客户端/服务器评估”。请参阅文档以获取更多信息以及如何禁用自动客户端查询评估。