如何使用实体框架(核心)解决每组最大 n 问题?

Sho*_*hoe 5 c# linq entity-framework greatest-n-per-group entity-framework-core

问题

例如,给定以下数据集:

new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 4, Group = 1, Value = "LOP", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 6, Group = 2, Value = "PEO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
new Entity { Id = 9, Group = 2, Value = "QPA", ... },
new Entity { Id = 10, Group = 2, Value = "LAN", ... },
// ... millions more records
Run Code Online (Sandbox Code Playgroud)

如何进行高效的查询(避免 N+1 查询问题)并为每个Grouporder by提供前 3 条记录Value

new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
// ...
Run Code Online (Sandbox Code Playgroud)

我尝试了什么?

Stack Overflow 上的大多数 LINQ 或实体框架解决方案都使用GroupByTake客户端进行评估(这意味着所有记录都导入到内存中,然后分组发生在数据库外部)。

我尝试过:

var list = await _dbContext.Entities
    .Select(x => new 
    { 
        OrderKey = _dbContext.Entities.Count(y =>
            x.Group == y.Group
                && y.Value < x.Value),
        Value = x,
     })
     .Where(x => x.OrderKey < 3)
     .OrderBy(x => x.OrderKey)
     .Select(x => x.Value)
     .ToListAsync(cancellationToken);
Run Code Online (Sandbox Code Playgroud)

但我很确定这效率很低。

奖金问题

如何将此逻辑提取到IQueryable<T>返回的扩展方法中IQueryable<T>

Iva*_*oev 5

有趣的问题。我看到的主要问题是没有标准的SQL 构造来执行此类操作 - 大多数数据库都提供自己的运算符来处理行集“窗口”,例如 SqlServer 的SELECT - OVER等。也没有“标准”LINQ 运算符/ 模式。

给定

IQueryable<Entity> source
Run Code Online (Sandbox Code Playgroud)

在 LINQ 中执行此类操作的典型方法是

var query = source.GroupBy(e => e.Group)
    .SelectMany(g => g.OrderBy(e => e.Value).Take(3));
Run Code Online (Sandbox Code Playgroud)

EF6 会转换为以下 SQL

IQueryable<Entity> source
Run Code Online (Sandbox Code Playgroud)

我不能说这是好还是坏翻译,但至少是一些翻译。重要的是,EF Core 目前(撰写本文时最新为 2.2.3)无法将其转换为 SQL,并将使用客户端评估(正如您提到的)。

因此,目前似乎只有 3 种可翻译的 LINQ 方式来编写此类查询:

(1)(你的)

var query = source.Where(e => source.Count(
    e2 => e2.Group == e.Group && e2.Value.CompareTo(e.Value) < 0) < 3);
Run Code Online (Sandbox Code Playgroud)

翻译为

var query = source.GroupBy(e => e.Group)
    .SelectMany(g => g.OrderBy(e => e.Value).Take(3));
Run Code Online (Sandbox Code Playgroud)

(2)

var query = source.Where(e => source.Where(e2 => e2.Group == e.Group)
    .OrderBy(e2 => e2.Value).Take(3).Contains(e));
Run Code Online (Sandbox Code Playgroud)

翻译为

SELECT
    [Limit1].[Id] AS [Id],
    [Limit1].[Group] AS [Group],
    [Limit1].[Value] AS [Value]
    FROM   (SELECT DISTINCT
        [Extent1].[Group] AS [Group]
        FROM [dbo].[Entity] AS [Extent1] ) AS [Distinct1]
    CROSS APPLY  (SELECT TOP (3) [Project2].[Id] AS [Id], [Project2].[Group] AS [Group], [Project2].[Value] AS [Value]
        FROM ( SELECT
            [Extent2].[Id] AS [Id],
            [Extent2].[Group] AS [Group],
            [Extent2].[Value] AS [Value]
            FROM [dbo].[Entity] AS [Extent2]
            WHERE [Distinct1].[Group] = [Extent2].[Group]
        )  AS [Project2]
        ORDER BY [Project2].[Value] ASC ) AS [Limit1]
Run Code Online (Sandbox Code Playgroud)

(3)

var query = source.SelectMany(e => source.Where(e2 => e2.Group == e.Group)
    .OrderBy(e2 => e2.Value).Take(3).Where(e2 => e2.Id == e.Id));
Run Code Online (Sandbox Code Playgroud)

翻译为

var query = source.Where(e => source.Count(
    e2 => e2.Group == e.Group && e2.Value.CompareTo(e.Value) < 0) < 3);
Run Code Online (Sandbox Code Playgroud)

我不能说选择哪一个 - 你必须衡量执行计划。

#1 比较运算符的主要缺点(如示例中所示 - 不能用于<s string,对于s 更糟糕),并且如果分组内不唯一,Guid也将无法正常工作。Value

从另一边来看,它可能是三者中最快的。但#2 和#3(甚至#1)的执行计划可能是相同的。

话虽如此,我不会提供通用的方法,因为所有这些方法都需要不同的参数,最终唯一共同的是组选择器Expression<Func<T, TGroupKey>>(例如e => e.Group)。但是(特别是对于#2和#3)可以编写这样的方法 - 它需要一些手动Expression操作,总的来说我不确定它是否值得付出努力