EF生成的查询需要太多时间才能执行

sab*_*ber 14 c# sql sql-server entity-framework sql-server-2008-r2

我有一个非常简单的查询,由Entity-Framework生成,有时当我尝试运行此查询时,执行几乎需要30秒以上,我有时间Exception.

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
[Extent1].[Url] AS [Url], 
[Extent1].[Description] AS [Description], 
[Extent1].[SentDate] AS [SentDate], 
[Extent1].[VisitCount] AS [VisitCount], 
[Extent1].[RssSourceId] AS [RssSourceId], 
[Extent1].[ReviewStatus] AS [ReviewStatus], 
[Extent1].[UserAccountId] AS [UserAccountId], 
[Extent1].[CreationDate] AS [CreationDate]
FROM ( SELECT [Extent1].[LinkID] AS [LinkID], [Extent1].[Title] AS [Title], [Extent1].[Url] AS [Url], [Extent1].[Description] AS [Description], [Extent1].[SentDate] AS [SentDate], [Extent1].[VisitCount] AS [VisitCount], [Extent1].[RssSourceId] AS [RssSourceId], [Extent1].[ReviewStatus] AS [ReviewStatus], [Extent1].[UserAccountId] AS [UserAccountId], [Extent1].[CreationDate] AS [CreationDate], row_number() OVER (ORDER BY [Extent1].[SentDate] DESC) AS [row_number]
    FROM [dbo].[Links] AS [Extent1]
)  AS [Extent1]
WHERE [Extent1].[row_number] > 0
ORDER BY [Extent1].[SentDate] DESC
Run Code Online (Sandbox Code Playgroud)

生成查询的代码是:

public async Task<IQueryable<TEntity>> GetAsync(Expression<Func<TEntity, bool>> filter = null,
    Func<IQueryable<TEntity>, IOrderedQueryable<TEntity>> orderBy = null)
{
    return await Task.Run(() =>
    {
        IQueryable<TEntity> query = _dbSet;
        if (filter != null)
        {
            query = query.Where(filter);
        }

        if (orderBy != null)
        {
            query = orderBy(query);
        }

        return query;
    });
}
Run Code Online (Sandbox Code Playgroud)

请注意,当我删除内部Select语句和Where子句并将其更改为以下时,Query会在不到一秒的时间内执行.

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
.
.
.
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
Run Code Online (Sandbox Code Playgroud)

任何建议都会有所帮助.

更新:

以下是上述代码的用法:

var dbLinks = await _uow.LinkRespository.GetAsync(filter, orderBy);
var pagedLinks = new PagedList<Link>(dbLinks, pageNumber, PAGE_SIZE);
var vmLinks = Mapper.Map<IPagedList<LinkViewItemViewModel>>(pagedLinks);
Run Code Online (Sandbox Code Playgroud)

并过滤:

var result = await GetLinks(null, pageNo, a => a.OrderByDescending(x => x.SentDate));
Run Code Online (Sandbox Code Playgroud)

Vla*_*nov 9

我从未想过你根本就没有索引.获得的经验 - 在进一步挖掘之前,请务必检查基础知识.


如果您不需要分页,则可以将查询简化为

SELECT TOP (10) 
    [Extent1].[LinkID] AS [LinkID], 
    [Extent1].[Title] AS [Title], 
    ...
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
Run Code Online (Sandbox Code Playgroud)

你已经验证了,它运行得很快.

显然,你确实需要分页,所以让我们看看我们能做些什么.

您当前版本的原因很慢,因为它首先扫描整个表,计算每行的行数,然后返回10行.我错了.SQL Server优化器非常智能.问题的根源在于其他地方.请参阅下面的更新.


BTW,正如其他人所提到的,只有当SentDate列是唯一的时,这个分页才能正常工作.如果它不是唯一的,您需要ORDER BY SentDate和其他一些独特的列一样ID来解决歧义.

如果您不需要直接跳转到特定页面,而是始终从第1页开始,然后转到下一页,下一页等等,那么在这篇优秀文章中描述了执行此类分页的正确有效方法:http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way 作者使用PostgreSQL进行说明,但该技术也适用于MS SQL Server.它归结为记住所ID显示页面上的最后一行,然后IDWHERE具有适当支持索引的子句中使用它来检索下一页而不扫描所有先前的行.

SQL Server 2008没有内置的分页支持,因此我们必须使用变通方法.我将展示一个允许直接跳转到给定页面的变体,并且可以快速地用于第一页,但对于其他页面将变得越来越慢.

您将在C#代码中包含这些变量(PageSize,PageNumber).我把它们放在这里来说明这一点.

DECLARE @VarPageSize int = 10; -- number of rows in each page
DECLARE @VarPageNumber int = 3; -- page numeration is zero-based

SELECT TOP (@VarPageSize)
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM [dbo].[Links] AS [Extent1]
        ORDER BY [Extent1].[SentDate] DESC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] ASC
;
Run Code Online (Sandbox Code Playgroud)

第一页是第1到第10行,第二页是第11到第20页,依此类推.让我们来看看这个查询是如何工作的,当我们尝试获得第四页,即行31至40 PageSize=10,PageNumber=3.在内部查询中,我们选择前40行.注意,我们这里扫描整个表,我们只扫描前40行.我们甚至不需要明确ROW_NUMBER().然后我们需要选择出那些被发现40的最后10行,所以外查询选择TOP(10)ORDER BY在相反方向上.这样就会以相反的顺序返回行40到31.您可以在客户端上将它们重新排序为正确的顺序,或者再添加一个外部查询,只需再次对它们进行排序SentDate DESC.像这样:

SELECT
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP (@VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM
            (
                SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
                    [Extent1].[LinkID] AS [LinkID]
                    ,[Extent1].[Title] AS [Title]
                    ,[Extent1].[Url] AS [Url]
                    ,[Extent1].[Description] AS [Description]
                    ,[Extent1].[SentDate] AS [SentDate]
                    ,[Extent1].[VisitCount] AS [VisitCount]
                    ,[Extent1].[RssSourceId] AS [RssSourceId]
                    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
                    ,[Extent1].[UserAccountId] AS [UserAccountId]
                    ,[Extent1].[CreationDate] AS [CreationDate]
                FROM [dbo].[Links] AS [Extent1]
                ORDER BY [Extent1].[SentDate] DESC
            ) AS [Extent1]
        ORDER BY [Extent1].[SentDate] ASC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC
Run Code Online (Sandbox Code Playgroud)

只有在SentDate唯一的情况下,此查询(作为原始查询)才能始终正确工作.如果它不是唯一的,请将唯一列添加到ORDER BY.例如,如果LinkID是唯一的,那么在最内层查询中使用ORDER BY SentDate DESC, LinkID DESC.在外部查询中反转顺序:ORDER BY SentDate ASC, LinkID ASC.

显然,如果你想跳转到第1000页,那么内部查询必须读取10,000行,所以越往前走,它就越慢.

在任何情况下,您都需要在SentDate(或SentDate, LinkID)上设置索引才能使其正常工作.如果没有索引,查询将再次扫描整个表.

我不是在这里告诉你如何将这个查询翻译成EF,因为我不知道.我从未使用EF.可能有办法.此外,显然,您可以强制它使用实际的SQL,而不是尝试使用C#代码.

更新

执行计划比较

在我的数据库中,我有一个EventLogErrors包含29,477,859行的表,我在SQL Server 2008上比较了ROW_NUMBER该EF生成的查询以及我在这里建议的内容TOP.我试图检索第10页10行长.在这两种情况下,优化器都足够智能,只能读取40行,正如您可以从执行计划中看到的那样.我使用主键列进行此测试的排序和分页.当我使用另一个索引列进行分页时,结果是相同的,即两个变体只读取40行.毋庸置疑,两种变体都会在几分之一秒内返回.

变种与 TOP

变种与TOP

变种与 ROW_NUMBER

变体与ROW_NUMBER

这一切意味着问题的根源在于其他地方.你提到你的查询有时候运行缓慢,我最初并没有真正关注它.出现这种症状我会做以下事情:

  • 检查执行计划.
  • 检查您是否有索引.
  • 检查索引是否没有严重碎片,并且统计信息不会过时.
  • SQL Server具有一个称为自动参数化的功能.此外,它还具有称为参数嗅探的功能.此外,它还具有称为执行计划缓存的功能.当所有三个功能协同工作时,可能会导致使用非最佳执行计划.Erland Sommarskog有一篇很好的文章详细解释了它:http://www.sommarskog.se/query-plan-mysteries.html本文解释了如何通过检查缓存的执行计划来确认问题是否真的在参数嗅探中以及如何解决问题.


der*_*oby 5

WHERE row_number > 0当你要求第2页,第3页等时,我猜测会随着时间的推移而改变...

因此,我很好奇是否有助于创建此索引:

CREATE INDEX idx_links_SentDate_desc ON [dbo].[Links] ([SentDate] DESC)
Run Code Online (Sandbox Code Playgroud)

老实说,如果它有效,它几乎是一个创可贴,你可能需要经常重建这个指数,因为我猜它会随着时间的推移而变得支离破碎......

更新:检查评论!事实证明DESC没有任何影响,如果您的数据从低到高,应该避免!