Ste*_*old 22 c# sql-server entity-framework sql-server-2008 entity-framework-6
我有一个关于Entity Framework查询执行性能的问题.
架构:
我有这样的表结构:
CREATE TABLE [dbo].[DataLogger]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[ProjectID] [bigint] NULL,
CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED ( [ID] ASC )
)
CREATE TABLE [dbo].[DCDistributionBox]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[DataLoggerID] [bigint] NOT NULL,
CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED ( [ID] ASC )
)
ALTER TABLE [dbo].[DCDistributionBox]
ADD CONSTRAINT [FK_DCDistributionBox_DataLogger]
FOREIGN KEY([DataLoggerID]) REFERENCES [dbo].[DataLogger] ([ID])
CREATE TABLE [dbo].[DCString]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[DCDistributionBoxID] [bigint] NOT NULL,
[CurrentMPP] [decimal](18, 2) NULL,
CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED ( [ID] ASC )
)
ALTER TABLE [dbo].[DCString]
ADD CONSTRAINT [FK_DCString_DCDistributionBox]
FOREIGN KEY([DCDistributionBoxID]) REFERENCES [dbo].[DCDistributionBox] ([ID])
CREATE TABLE [dbo].[StringData]
(
[DCStringID] [bigint] NOT NULL,
[TimeStamp] [datetime] NOT NULL,
[DCCurrent] [decimal](18, 2) NULL,
CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED ( [TimeStamp] DESC, [DCStringID] ASC)
)
CREATE NONCLUSTERED INDEX [TimeStamp_DCCurrent-NonClusteredIndex]
ON [dbo].[StringData] ([DCStringID] ASC, [TimeStamp] ASC)
INCLUDE ([DCCurrent])
Run Code Online (Sandbox Code Playgroud)
外键上的标准索引也存在(我不想出于空间原因列出所有索引).
该[StringData]表具有以下存储统计信息:
用法:
我现在想要对[StringData]表中的数据进行分组并进行一些聚合.
我创建了一个实体框架查询(查询的详细信息可以在这里找到):
var compareData = model.StringDatas
.AsNoTracking()
.Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp >= fromDate && p.TimeStamp < tillDate)
.Select(d => new
{
TimeStamp = d.TimeStamp,
DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
})
.GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
.Select(d => new
{
TimeStamp = d.Key,
DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
})
.ToList();
Run Code Online (Sandbox Code Playgroud)
执行时间跨度特别长!?
尝试:
我现在看一下Entity Framework生成的SQL查询,看起来像这样:
DECLARE @p__linq__4 DATETIME = 0;
DECLARE @p__linq__3 DATETIME = 0;
DECLARE @p__linq__5 INT = 15;
DECLARE @p__linq__6 INT = 15;
DECLARE @p__linq__0 BIGINT = 20827;
DECLARE @p__linq__1 DATETIME = '06.02.2016 00:00:00';
DECLARE @p__linq__2 DATETIME = '07.02.2016 00:00:00';
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3],
[GroupBy1].[A2] AS [C4],
[GroupBy1].[A3] AS [C5],
[GroupBy1].[A4] AS [C6]
FROM ( SELECT
[Project1].[K1] AS [K1],
MIN([Project1].[A1]) AS [A1],
MAX([Project1].[A2]) AS [A2],
AVG([Project1].[A3]) AS [A3],
STDEVP([Project1].[A4]) AS [A4]
FROM ( SELECT
DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1],
[Project1].[C1] AS [A1],
[Project1].[C1] AS [A2],
[Project1].[C1] AS [A3],
[Project1].[C1] AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE (([Extent4].[ProjectID] = @p__linq__0) OR (([Extent4].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
) AS [Project1]
) AS [Project1]
GROUP BY [K1]
) AS [GroupBy1]
Run Code Online (Sandbox Code Playgroud)
我将此SQL查询复制到同一台机器上的SSMS中,并使用与实体框架相同的连接字符串连接.
结果是性能大大提高:
我也做了一些循环运行测试,结果很奇怪.测试看起来像这样
for (int i = 0; i < 50; i++)
{
DateTime begin = DateTime.UtcNow;
[...query...]
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)
结果非常不同,看起来是随机的(?):
0th run: 00:00:11.0618580
1th run: 00:00:11.3339467
2th run: 00:00:10.0000676
3th run: 00:00:10.1508140
4th run: 00:00:09.2041939
5th run: 00:00:07.6710321
6th run: 00:00:10.3386312
7th run: 00:00:17.3422765
8th run: 00:00:13.8620557
9th run: 00:00:14.9041528
10th run: 00:00:12.7772906
11th run: 00:00:17.0170235
12th run: 00:00:14.7773750
Run Code Online (Sandbox Code Playgroud)
问题:
为什么Entity Framework查询执行速度如此之慢?生成的行数非常低,原始SQL查询显示非常快的性能.
更新1:
我注意它不是MetaContext或模型创建延迟.其他一些查询在之前的同一个Model实例上执行,具有良好的性能.
更新2(与@ x0007me的答案相关):
感谢提示,但可以通过更改模型设置来消除这种情况:
modelContext.Configuration.UseDatabaseNullSemantics = true;
Run Code Online (Sandbox Code Playgroud)
EF生成的SQL现在是:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3],
[GroupBy1].[A2] AS [C4],
[GroupBy1].[A3] AS [C5],
[GroupBy1].[A4] AS [C6]
FROM ( SELECT
[Project1].[K1] AS [K1],
MIN([Project1].[A1]) AS [A1],
MAX([Project1].[A2]) AS [A2],
AVG([Project1].[A3]) AS [A3],
STDEVP([Project1].[A4]) AS [A4]
FROM ( SELECT
DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1],
[Project1].[C1] AS [A1],
[Project1].[C1] AS [A2],
[Project1].[C1] AS [A3],
[Project1].[C1] AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
) AS [Project1]
) AS [Project1]
GROUP BY [K1]
) AS [GroupBy1]
Run Code Online (Sandbox Code Playgroud)
因此,您可以看到您所描述的问题现已解决,但执行时间不会改变.
另外,正如您在架构和原始执行时间中所看到的,我使用了具有高度优化索引器的优化结构.
更新3(与@Vladimir Baranov的答案有关):
我不明白为什么这可能与查询计划缓存有关.因为在MSDN中清楚地描述了EF6利用查询计划缓存.
一个简单的测试证明,巨大的执行时间differenz与查询计划缓存(伪代码)无关:
using(var modelContext = new ModelContext())
{
modelContext.Query(); //1th run activates caching
modelContext.Query(); //2th used cached plan
}
Run Code Online (Sandbox Code Playgroud)
结果,两个查询都以相同的执行时间运行.
更新4(与@bubi的答案有关):
我试图在描述它时运行由EF生成的查询:
int result = model.Database.ExecuteSqlCommand(@"SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3],
[GroupBy1].[A2] AS [C4],
[GroupBy1].[A3] AS [C5],
[GroupBy1].[A4] AS [C6]
FROM ( SELECT
[Project1].[K1] AS [K1],
MIN([Project1].[A1]) AS [A1],
MAX([Project1].[A2]) AS [A2],
AVG([Project1].[A3]) AS [A3],
STDEVP([Project1].[A4]) AS [A4]
FROM ( SELECT
DATEADD (minute, ((DATEDIFF (minute, 0, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, 0) AS [K1],
[Project1].[C1] AS [A1],
[Project1].[C1] AS [A2],
[Project1].[C1] AS [A3],
[Project1].[C1] AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
) AS [Project1]
) AS [Project1]
GROUP BY [K1]
) AS [GroupBy1]",
new SqlParameter("p__linq__0", 20827),
new SqlParameter("p__linq__1", fromDate),
new SqlParameter("p__linq__2", tillDate),
new SqlParameter("p__linq__5", 15),
new SqlParameter("p__linq__6", 15));
Run Code Online (Sandbox Code Playgroud)
只要正常的EF查询就花了精确!?
更新5(与@vittore的答案有关):
我创建一个跟踪调用树,也许它有帮助:
更新6(与@usr的答案有关):
我通过SQL Server Profiler创建了两个showplan XML.
更新7(与@VladimirBaranov的评论相关):
我现在运行一些与您的评论相关的测试用例.
首先,我通过使用新的计算列和匹配的INDEXER来完成订单操作的时间.这减少了相关的性能滞后DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [TimeStamp] ) / 15* 15, 0).详细了解如何以及为何找到此处.
结果看起来像这样:
Pure EntityFramework查询:
for (int i = 0; i < 3; i++)
{
DateTime begin = DateTime.UtcNow;
var result = model.StringDatas
.AsNoTracking()
.Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp15Minutes >= fromDate && p.TimeStamp15Minutes < tillDate)
.Select(d => new
{
TimeStamp = d.TimeStamp15Minutes,
DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
})
.GroupBy(d => d.TimeStamp)
.Select(d => new
{
TimeStamp = d.Key,
DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
})
.ToList();
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run pure EF: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)
第0次运行纯EF:00:00:12.6460624
第1次运行纯EF:00:00:11.0258393
第2次运行纯EF:00:00:08.4171044
我现在使用EF生成的SQL作为SQL查询:
for (int i = 0; i < 3; i++)
{
DateTime begin = DateTime.UtcNow;
int result = model.Database.ExecuteSqlCommand(@"SELECT
1 AS [C1],
[GroupBy1].[K1] AS [TimeStamp15Minutes],
[GroupBy1].[A1] AS [C2],
[GroupBy1].[A2] AS [C3],
[GroupBy1].[A3] AS [C4],
[GroupBy1].[A4] AS [C5]
FROM ( SELECT
[Project1].[TimeStamp15Minutes] AS [K1],
MIN([Project1].[C1]) AS [A1],
MAX([Project1].[C1]) AS [A2],
AVG([Project1].[C1]) AS [A3],
STDEVP([Project1].[C1]) AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
) AS [Project1]
GROUP BY [Project1].[TimeStamp15Minutes]
) AS [GroupBy1];",
new SqlParameter("p__linq__0", 20827),
new SqlParameter("p__linq__1", fromDate),
new SqlParameter("p__linq__2", tillDate));
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)
第0次运行:00:00:00.8381200
第1次运行:00:00:00.6920736
第2次运行:00:00:00.7081006
并与OPTION(RECOMPILE):
for (int i = 0; i < 3; i++)
{
DateTime begin = DateTime.UtcNow;
int result = model.Database.ExecuteSqlCommand(@"SELECT
1 AS [C1],
[GroupBy1].[K1] AS [TimeStamp15Minutes],
[GroupBy1].[A1] AS [C2],
[GroupBy1].[A2] AS [C3],
[GroupBy1].[A3] AS [C4],
[GroupBy1].[A4] AS [C5]
FROM ( SELECT
[Project1].[TimeStamp15Minutes] AS [K1],
MIN([Project1].[C1]) AS [A1],
MAX([Project1].[C1]) AS [A2],
AVG([Project1].[C1]) AS [A3],
STDEVP([Project1].[C1]) AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
) AS [Project1]
GROUP BY [Project1].[TimeStamp15Minutes]
) AS [GroupBy1]
OPTION(RECOMPILE);",
new SqlParameter("p__linq__0", 20827),
new SqlParameter("p__linq__1", fromDate),
new SqlParameter("p__linq__2", tillDate));
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
Run Code Online (Sandbox Code Playgroud)
使用RECOMPILE进行第0次运行:00:00:00.8260932
使用RECOMPILE进行第1次运行:00:00:00.9139730
使用RECOMPILE运行第2次:00:00:01.0680665
SSMS中的相同SQL查询(没有RECOMPILE):
00:00:01.105
SSMS中使用相同的SQL查询(使用RECOMPILE):
00:00:00.902
我希望这都是你需要的价值观.
Vla*_*nov 14
在这个答案中,我专注于原始观察:EF生成的查询速度很慢,但是当在SSMS中运行相同的查询时,它很快.
这种行为的一种可能解释是参数嗅探.
SQL Server在执行具有参数的存储过程时使用称为参数嗅探的过程.编译或重新编译过程时,将评估传递给参数的值,并将其用于创建执行计划.然后将该值与执行计划一起存储在计划缓存中.在随后的执行中,使用相同的值 - 和相同的计划.
因此,EF生成一个参数很少的查询.第一次运行此查询时,服务器使用在第一次运行中生效的参数值为此查询创建执行计划.那个计划通常都很好.但是,稍后您使用其他参数值运行相同的EF查询.对于新的参数值,先前生成的计划可能不是最优的,并且查询变慢.服务器继续使用以前的计划,因为它仍然是相同的查询,只是参数的值不同.
如果此时您获取查询文本并尝试直接在SSMS中运行它,服务器将创建一个新的执行计划,因为从技术上讲,它与EF应用程序发出的查询不同.即使一个字符差异就足够了,会话设置的任何更改也足以让服务器将查询视为新查询.因此,服务器在其缓存中有两个看似相同的查询计划.第一个"慢"计划对于参数的新值来说很慢,因为它最初是为不同的参数值构建的.第二个"快速"计划是针对当前参数值构建的,因此速度很快.
由Erland Sommarskog撰写的文章" 慢速应用程序中的慢速SSMS"更详细地解释了这个和其他相关领域.
有几种方法可以放弃缓存的计划并强制服务器重新生成它们.更改表或更改表索引应该这样做 - 它应该丢弃与此表相关的所有计划,包括"慢"和"快".然后在EF应用程序中使用新的参数值运行查询并获得新的"快速"计划.您在SSMS中运行查询并获得具有新参数值的第二个"快速"计划.服务器仍然生成两个计划,但这两个计划现在都很快.
另一种变体是添加OPTION(RECOMPILE)到查询中.使用此选项,服务器不会将生成的计划存储在其缓存中.因此,每次查询运行时,服务器都会使用实际参数值来生成(它认为)对于给定参数值最佳的计划.缺点是计划生成的额外开销.
请注意,例如,由于过时的统计信息,服务器仍然可以选择使用此选项的"不良"计划.但是,至少参数嗅探不会成为问题.
| 归档时间: |
|
| 查看次数: |
9358 次 |
| 最近记录: |