实体框架查询性能与原始SQL执行的extrem不同

Ste*_*old 22 c# sql-server entity-framework sql-server-2008 entity-framework-6

我有一个关于Entity Framework查询执行性能的问题.

架构:

我有这样的表结构:

CREATE TABLE [dbo].[DataLogger]
(
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [ProjectID] [bigint] NULL,
    CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

CREATE TABLE [dbo].[DCDistributionBox]
(
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [DataLoggerID] [bigint] NOT NULL,
    CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

ALTER TABLE [dbo].[DCDistributionBox]
    ADD CONSTRAINT [FK_DCDistributionBox_DataLogger] 
    FOREIGN KEY([DataLoggerID]) REFERENCES [dbo].[DataLogger] ([ID])

CREATE TABLE [dbo].[DCString] 
(
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [DCDistributionBoxID] [bigint] NOT NULL,
    [CurrentMPP] [decimal](18, 2) NULL,
    CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

ALTER TABLE [dbo].[DCString]
    ADD CONSTRAINT [FK_DCString_DCDistributionBox] 
    FOREIGN KEY([DCDistributionBoxID]) REFERENCES [dbo].[DCDistributionBox] ([ID])

CREATE TABLE [dbo].[StringData]
(
    [DCStringID] [bigint] NOT NULL,
    [TimeStamp] [datetime] NOT NULL,
    [DCCurrent] [decimal](18, 2) NULL,
    CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED ( [TimeStamp] DESC, [DCStringID] ASC)
)

CREATE NONCLUSTERED INDEX [TimeStamp_DCCurrent-NonClusteredIndex] 
ON [dbo].[StringData] ([DCStringID] ASC, [TimeStamp] ASC)
INCLUDE ([DCCurrent])
Run Code Online (Sandbox Code Playgroud)

外键上的标准索引也存在(我不想出于空间原因列出所有索引).

[StringData]表具有以下存储统计信息:

  • 数据空间:26,901.86 MB
  • 行数:131,827,749
  • 分区:是的
  • 分区数:62

用法:

我现在想要对[StringData]表中的数据进行分组并进行一些聚合.

我创建了一个实体框架查询(查询的详细信息可以在这里找到):

var compareData = model.StringDatas
    .AsNoTracking()
    .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp >= fromDate && p.TimeStamp < tillDate)
    .Select(d => new
    {
        TimeStamp = d.TimeStamp,
        DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
    })
    .GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
    .Select(d => new
    {
        TimeStamp = d.Key,
        DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
        DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
        DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
        DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
    })
    .ToList();
Run Code Online (Sandbox Code Playgroud)

执行时间跨度特别长!?

  • 执行结果:92rows
  • 执行时间:~16000ms

尝试:

我现在看一下Entity Framework生成的SQL查询,看起来像这样:

DECLARE @p__linq__4 DATETIME = 0;
DECLARE @p__linq__3 DATETIME = 0;
DECLARE @p__linq__5 INT = 15;
DECLARE @p__linq__6 INT = 15;
DECLARE @p__linq__0 BIGINT = 20827;
DECLARE @p__linq__1 DATETIME = '06.02.2016 00:00:00';
DECLARE @p__linq__2 DATETIME = '07.02.2016 00:00:00';

SELECT 
1 AS [C1], 
[GroupBy1].[K1] AS [C2], 
[GroupBy1].[A1] AS [C3], 
[GroupBy1].[A2] AS [C4], 
[GroupBy1].[A3] AS [C5], 
[GroupBy1].[A4] AS [C6]
FROM ( SELECT 
    [Project1].[K1] AS [K1], 
    MIN([Project1].[A1]) AS [A1], 
    MAX([Project1].[A2]) AS [A2], 
    AVG([Project1].[A3]) AS [A3], 
    STDEVP([Project1].[A4]) AS [A4]
    FROM ( SELECT 
        DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
        [Project1].[C1] AS [A1], 
        [Project1].[C1] AS [A2], 
        [Project1].[C1] AS [A3], 
        [Project1].[C1] AS [A4]
        FROM ( SELECT 
            [Extent1].[TimeStamp] AS [TimeStamp], 
            [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
            FROM    [dbo].[StringData] AS [Extent1]
            INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
            INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
            INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
            WHERE (([Extent4].[ProjectID] = @p__linq__0) OR (([Extent4].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
        )  AS [Project1]
    )  AS [Project1]
    GROUP BY [K1]
)  AS [GroupBy1]
Run Code Online (Sandbox Code Playgroud)

我将此SQL查询复制到同一台机器上的SSMS中,并使用与实体框架相同的连接字符串连接.

结果是性能大大提高:

  • 执行结果:92rows
  • 执行时间:517ms

我也做了一些循环运行测试,结果很奇怪.测试看起来像这样

for (int i = 0; i < 50; i++)
{
    DateTime begin = DateTime.UtcNow;

    [...query...]

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)

结果非常不同,看起来是随机的(?):

0th run: 00:00:11.0618580
1th run: 00:00:11.3339467
2th run: 00:00:10.0000676
3th run: 00:00:10.1508140
4th run: 00:00:09.2041939
5th run: 00:00:07.6710321
6th run: 00:00:10.3386312
7th run: 00:00:17.3422765
8th run: 00:00:13.8620557
9th run: 00:00:14.9041528
10th run: 00:00:12.7772906
11th run: 00:00:17.0170235
12th run: 00:00:14.7773750
Run Code Online (Sandbox Code Playgroud)

问题:

为什么Entity Framework查询执行速度如此之慢?生成的行数非常低,原始SQL查询显示非常快的性能.

更新1:

我注意它不是MetaContext或模型创建延迟.其他一些查询在之前的同一个Model实例上执行,具有良好的性能.

更新2(与@ x0007me的答案相关):

感谢提示,但可以通过更改模型设置来消除这种情况:

modelContext.Configuration.UseDatabaseNullSemantics = true;
Run Code Online (Sandbox Code Playgroud)

EF生成的SQL现在是:

SELECT 
1 AS [C1], 
[GroupBy1].[K1] AS [C2], 
[GroupBy1].[A1] AS [C3], 
[GroupBy1].[A2] AS [C4], 
[GroupBy1].[A3] AS [C5], 
[GroupBy1].[A4] AS [C6]
FROM ( SELECT 
    [Project1].[K1] AS [K1], 
    MIN([Project1].[A1]) AS [A1], 
    MAX([Project1].[A2]) AS [A2], 
    AVG([Project1].[A3]) AS [A3], 
    STDEVP([Project1].[A4]) AS [A4]
    FROM ( SELECT 
        DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
        [Project1].[C1] AS [A1], 
        [Project1].[C1] AS [A2], 
        [Project1].[C1] AS [A3], 
        [Project1].[C1] AS [A4]
        FROM ( SELECT 
            [Extent1].[TimeStamp] AS [TimeStamp], 
            [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
            FROM    [dbo].[StringData] AS [Extent1]
            INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
            INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
            INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
            WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
        )  AS [Project1]
    )  AS [Project1]
    GROUP BY [K1]
)  AS [GroupBy1]
Run Code Online (Sandbox Code Playgroud)

因此,您可以看到您所描述的问题现已解决,但执行时间不会改变.

另外,正如您在架构和原始执行时间中所看到的,我使用了具有高度优化索引器的优化结构.

更新3(与@Vladimir Baranov的答案有关):

我不明白为什么这可能与查询计划缓存有关.因为在MSDN中清楚地描述了EF6利用查询计划缓存.

一个简单的测试证明,巨大的执行时间differenz与查询计划缓存(伪代码)无关:

using(var modelContext = new ModelContext())
{
    modelContext.Query(); //1th run activates caching

    modelContext.Query(); //2th used cached plan
}
Run Code Online (Sandbox Code Playgroud)

结果,两个查询都以相同的执行时间运行.

更新4(与@bubi的答案有关):

我试图在描述它时运行由EF生成的查询:

int result = model.Database.ExecuteSqlCommand(@"SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[A1] AS [C3], 
    [GroupBy1].[A2] AS [C4], 
    [GroupBy1].[A3] AS [C5], 
    [GroupBy1].[A4] AS [C6]
    FROM ( SELECT 
        [Project1].[K1] AS [K1], 
        MIN([Project1].[A1]) AS [A1], 
        MAX([Project1].[A2]) AS [A2], 
        AVG([Project1].[A3]) AS [A3], 
        STDEVP([Project1].[A4]) AS [A4]
        FROM ( SELECT 
            DATEADD (minute, ((DATEDIFF (minute, 0, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, 0) AS [K1], 
            [Project1].[C1] AS [A1], 
            [Project1].[C1] AS [A2], 
            [Project1].[C1] AS [A3], 
            [Project1].[C1] AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp] AS [TimeStamp], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
            )  AS [Project1]
        )  AS [Project1]
        GROUP BY [K1]
    )  AS [GroupBy1]",
    new SqlParameter("p__linq__0", 20827),
    new SqlParameter("p__linq__1", fromDate),
    new SqlParameter("p__linq__2", tillDate),
    new SqlParameter("p__linq__5", 15),
    new SqlParameter("p__linq__6", 15));
Run Code Online (Sandbox Code Playgroud)
  • 执行结果:92
  • 执行时间:~16000ms

只要正常的EF查询就花了精确!?

更新5(与@vittore的答案有关):

我创建一个跟踪调用树,也许它有帮助:

调用树跟踪

更新6(与@usr的答案有关):

我通过SQL Server Profiler创建了两个showplan XML.

快速运行(SSMS).SQLPlan

慢跑(EF).SQLPlan

更新7(与@VladimirBaranov的评论相关):

我现在运行一些与您的评论相关的测试用例.

首先,我通过使用新的计算列和匹配的INDEXER来完成订单操作的时间.这减少了相关的性能滞后DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [TimeStamp] ) / 15* 15, 0).详细了解如何以及为何找到此处.

结果看起来像这样:

Pure EntityFramework查询:

for (int i = 0; i < 3; i++)
{
    DateTime begin = DateTime.UtcNow;
    var result = model.StringDatas
        .AsNoTracking()
        .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp15Minutes >= fromDate && p.TimeStamp15Minutes < tillDate)
        .Select(d => new
        {
            TimeStamp = d.TimeStamp15Minutes,
            DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
        })
        .GroupBy(d => d.TimeStamp)
        .Select(d => new
        {
            TimeStamp = d.Key,
            DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
            DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
            DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
            DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
        })
        .ToList();

        TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
        Debug.WriteLine("{0}th run pure EF: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)

第0次运行纯EF:00:00:12.6460624

第1次运行纯EF:00:00:11.0258393

第2次运行纯EF:00:00:08.4171044

我现在使用EF生成的SQL作为SQL查询:

for (int i = 0; i < 3; i++)
{
    DateTime begin = DateTime.UtcNow;
    int result = model.Database.ExecuteSqlCommand(@"SELECT 
        1 AS [C1], 
        [GroupBy1].[K1] AS [TimeStamp15Minutes], 
        [GroupBy1].[A1] AS [C2], 
        [GroupBy1].[A2] AS [C3], 
        [GroupBy1].[A3] AS [C4], 
        [GroupBy1].[A4] AS [C5]
        FROM ( SELECT 
            [Project1].[TimeStamp15Minutes] AS [K1], 
            MIN([Project1].[C1]) AS [A1], 
            MAX([Project1].[C1]) AS [A2], 
            AVG([Project1].[C1]) AS [A3], 
            STDEVP([Project1].[C1]) AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
            )  AS [Project1]
            GROUP BY [Project1].[TimeStamp15Minutes]
        )  AS [GroupBy1];",
    new SqlParameter("p__linq__0", 20827),
    new SqlParameter("p__linq__1", fromDate),
    new SqlParameter("p__linq__2", tillDate));

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)

第0次运行:00:00:00.8381200

第1次运行:00:00:00.6920736

第2次运行:00:00:00.7081006

并与OPTION(RECOMPILE):

for (int i = 0; i < 3; i++)
{
    DateTime begin = DateTime.UtcNow;
    int result = model.Database.ExecuteSqlCommand(@"SELECT 
        1 AS [C1], 
        [GroupBy1].[K1] AS [TimeStamp15Minutes], 
        [GroupBy1].[A1] AS [C2], 
        [GroupBy1].[A2] AS [C3], 
        [GroupBy1].[A3] AS [C4], 
        [GroupBy1].[A4] AS [C5]
        FROM ( SELECT 
            [Project1].[TimeStamp15Minutes] AS [K1], 
            MIN([Project1].[C1]) AS [A1], 
            MAX([Project1].[C1]) AS [A2], 
            AVG([Project1].[C1]) AS [A3], 
            STDEVP([Project1].[C1]) AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
            )  AS [Project1]
            GROUP BY [Project1].[TimeStamp15Minutes]
        )  AS [GroupBy1]
        OPTION(RECOMPILE);",
    new SqlParameter("p__linq__0", 20827),
    new SqlParameter("p__linq__1", fromDate),
    new SqlParameter("p__linq__2", tillDate));

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)

使用RECOMPILE进行第0次运行:00:00:00.8260932

使用RECOMPILE进行第1次运行:00:00:00.9139730

使用RECOMPILE运行第2次:00:00:01.0680665

SSMS中的相同SQL查询(没有RECOMPILE):

00:00:01.105

SSMS中使用相同的SQL查询(使用RECOMPILE):

00:00:00.902

我希望这都是你需要的价值观.

Vla*_*nov 14

在这个答案中,我专注于原始观察:EF生成的查询速度很慢,但是当在SSMS中运行相同的查询时,它很快.

这种行为的一种可能解释是参数嗅探.

SQL Server在执行具有参数的存储过程时使用称为参数嗅探的过程.编译或重新编译过程时,将评估传递给参数的值,并将其用于创建执行计划.然后将该值与执行计划一起存储在计划缓存中.在随后的执行中,使用相同的值 - 和相同的计划.

因此,EF生成一个参数很少的查询.第一次运行此查询时,服务器使用在第一次运行中生效的参数值为此查询创建执行计划.那个计划通常都很好.但是,稍后您使用其他参数值运行相同的EF查询.对于新的参数值,先前生成的计划可能不是最优的,并且查询变慢.服务器继续使用以前的计划,因为它仍然是相同的查询,只是参数的值不同.

如果此时您获取查询文本并尝试直接在SSMS中运行它,服务器将创建一个新的执行计划,因为从技术上讲,它与EF应用程序发出的查询不同.即使一个字符差异就足够了,会话设置的任何更改也足以让服务器将查询视为新查询.因此,服务器在其缓存中有两个看似相同的查询计划.第一个"慢"计划对于参数的新值来说很慢,因为它最初是为不同的参数值构建的.第二个"快速"计划是针对当前参数值构建的,因此速度很快.

由Erland Sommarskog撰写的文章" 慢速应用程序中的慢速SSMS"更详细地解释了这个和其他相关领域.

有几种方法可以放弃缓存的计划并强制服务器重新生成它们.更改表或更改表索引应该这样做 - 它应该丢弃与此表相关的所有计划,包括"慢"和"快".然后在EF应用程序中使用新的参数值运行查询并获得新的"快速"计划.您在SSMS中运行查询并获得具有新参数值的第二个"快速"计划.服务器仍然生成两个计划,但这两个计划现在都很快.

另一种变体是添加OPTION(RECOMPILE)到查询中.使用此选项,服务器不会将生成的计划存储在其缓存中.因此,每次查询运行时,服务器都会使用实际参数值来生成(它认为)对于给定参数值最佳的计划.缺点是计划生成的额外开销.

请注意,例如,由于过时的统计信息,服务器仍然可以选择使用此选项的"不良"计划.但是,至少参数嗅探不会成为问题.

  • 谢谢@VladimirBaranov!我可以100%确认你暗示参数嗅探.一个简单的测试:我搜索了我的EF查询的所有执行计划并删除然后像描述[这里](http://stackoverflow.com/a/30444176/1112048).在此之后,我首先运行EF查询,它的速度和我想象的一样快(<1秒).我想你提出了正确的答案.我会建议所有遇到同样问题的人来支持这个EF [UserVoice](https://data.uservoice.com/forums/72025-entity-framework-feature-suggestions/suggestions/6566335-support-for-选项-重新编译上生成-querie). (2认同)