sto*_*roz 23 .net c# linq sql-server entity-framework
我正在使用Entity Framework(代码优先),并且在我的LINQ查询中查找指定子句的顺序会对性能产生巨大影响,例如:
using (var db = new MyDbContext())
{
    var mySize = "medium";
    var myColour = "vermilion";
    var list1 = db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
    var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}
如果(罕见)颜色子句在(通用)大小子句之前,它的速度很快,但反过来它的速度要慢一些.该表有几百万行,所讨论的两个字段是nvarchar(50),因此没有标准化,但它们都是索引的.这些字段以代码第一种方式指定,如下所示:
    [StringLength(50)]
    public string Colour { get; set; }
    [StringLength(50)]
    public string Size { get; set; }
我真的应该在我的LINQ查询中担心这些事情,我认为那是数据库的工作吗?
系统规格如下:
对,任何惩罚的贪婪,效果可以如下复制.这个问题似乎对许多因素非常敏感,所以请关注其中某些因素的人为性质:
通过nuget安装EntityFramework 6.0.0-beta1,然后生成代码第一个样式:
public class Widget
{
    [Key]
    public int WidgetId { get; set; }
    [StringLength(50)]
    public string Size { get; set; }
    [StringLength(50)]
    public string Colour { get; set; }
}
public class MyDbContext : DbContext
{
    public MyDbContext()
        : base("DefaultConnection")
    {
    }
    public DbSet<Widget> Widgets { get; set; }
}
使用以下SQL生成虚拟数据:
insert into gadget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3
为Color和Size添加一个索引,然后使用以下命令查询:
string mySize = "99 is the name is this size";
string myColour = "9999 is the name of this colour";
using (var db = new WebDbContext())
{
    var list1= db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
}
using (var db = new WebDbContext())
{
    var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}
该问题似乎与生成的SQL中的NULL比较的钝集合有关,如下所示.
exec sp_executesql N'SELECT 
[Extent1].[WidgetId] AS [WidgetId], 
[Extent1].[Size] AS [Size], 
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0) 
AND ( NOT ([Extent1].[Size] IS NULL OR @p__linq__0 IS NULL))) 
OR (([Extent1].[Size] IS NULL) AND (@p__linq__0 IS NULL))) 
AND ((([Extent1].[Colour] = @p__linq__1) AND ( NOT ([Extent1].[Colour] IS NULL 
OR @p__linq__1 IS NULL))) OR (([Extent1].[Colour] IS NULL) 
AND (@p__linq__1 IS NULL)))',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'
go
改变在LINQ到StartWith(相等运算符),使问题消失,如不改变任何这两个领域之一是在数据库非空.
我绝望了!
对于任何赏金猎人的一些援助,这个问题可以在SQL Server 2008 R2的Web(64位)在一个干净的数据库上进行复制,如下所示:
CREATE TABLE [dbo].[Widget](
    [WidgetId] [int] IDENTITY(1,1) NOT NULL,
    [Size] [nvarchar](50) NULL,
    [Colour] [nvarchar](50) NULL,
 CONSTRAINT [PK_dbo.Widget] PRIMARY KEY CLUSTERED 
(
    [WidgetId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Size ON dbo.Widget
    (
    Size
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Colour ON dbo.Widget
    (
    Colour
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
insert into Widget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3
GO
然后比较以下两个查询的相对表现(您可能需要以获得查询返回几排,以观察效果,调整参数测试值,即第二查询ID慢得多).
exec sp_executesql N'SELECT 
[Extent1].[WidgetId] AS [WidgetId], 
[Extent1].[Size] AS [Size], 
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Colour] = @p__linq__0) 
AND ( NOT ([Extent1].[Colour] IS NULL 
OR @p__linq__0 IS NULL))) 
OR (([Extent1].[Colour] IS NULL) 
AND (@p__linq__0 IS NULL))) 
AND ((([Extent1].[Size] = @p__linq__1) 
AND ( NOT ([Extent1].[Size] IS NULL 
OR @p__linq__1 IS NULL))) 
OR (([Extent1].[Size] IS NULL) AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'9999 is the name of this colour',
@p__linq__1=N'99 is the name is this size'
go
exec sp_executesql N'SELECT 
[Extent1].[WidgetId] AS [WidgetId], 
[Extent1].[Size] AS [Size], 
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0) 
AND ( NOT ([Extent1].[Size] IS NULL 
OR @p__linq__0 IS NULL))) 
OR (([Extent1].[Size] IS NULL) 
AND (@p__linq__0 IS NULL))) 
AND ((([Extent1].[Colour] = @p__linq__1) 
AND ( NOT ([Extent1].[Colour] IS NULL 
OR @p__linq__1 IS NULL))) 
OR (([Extent1].[Colour] IS NULL) 
AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'
您也可以像我一样找到,如果重新运行虚拟数据插入,以便现在有两百万行,问题就会消失.
usr*_*usr 21
问题的核心不是"为什么命令与LINQ有关?".LINQ只是翻译而无需重新排序.真正的问题是"为什么这两个SQL查询具有不同的性能?".
我只能插入100k行才能重现问题.在这种情况下,优化器中的弱点被触发:Colour由于复杂的条件,它无法识别它可以进行搜索.在第一个查询中,优化器确实识别模式并创建索引搜索.
这应该是没有语义原因的.即使在寻求时,也可以寻求指数NULL.这是优化器中的弱点/错误.以下是两个计划:

EF试图在这里提供帮助,因为它假定列和过滤器变量都可以为空.在这种情况下,它会尝试给你一个匹配(根据C#语义是正确的).
我尝试通过添加以下过滤器来撤消它:
Colour IS NOT NULL AND @p__linq__0 IS NOT NULL
AND Size IS NOT NULL AND @p__linq__1 IS NOT NULL
希望优化器现在使用该知识来简化复杂的EF过滤器表达式.它没有设法这样做.如果这有效,可以在EF查询中添加相同的过滤器,从而提供简单的修复.
以下是我建议按照您应该尝试的顺序修复:
Colour, Size和/或Size, Colour.他们也删除了他们的问题.INTERSECT/ Queryable.Intersect组合过滤器.这通常会导致不同的计划形状.所有这些都是解决方法,而不是根本原因修复.
最后,我对SQL Server和EF都不满意.两种产品都应该是固定的.唉,他们可能不会,你也不能等待.
以下是索引脚本:
CREATE NONCLUSTERED INDEX IX_Widget_Colour_Size ON dbo.Widget
    (
    Colour, Size
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX IX_Widget_Size_Colour ON dbo.Widget
    (
   Size, Colour
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
div*_*ega 11
注意:在其他人已经提供了一般正确的答案后很久就遇到了这个问题.我决定将此作为单独的答案发布,因为我认为解决方法可能会有所帮助,并且因为您可能希望更好地了解EF的行为方式.
简短回答:此问题的最佳解决方法是在DbContext实例上设置此标志:
context.Configuration.UseDatabaseNullSemantics = true;
当您执行此操作时,所有额外的空值检查将消失,如果受此问题的影响,您的查询应该执行得更快.
答案很长:在这个主题中的其他人是正确的,在EF6中我们默认引入了额外的空检查项,以补偿数据库中的空比较语义(三值逻辑)和标准的内存空比较之间的差异.这样做的目的是满足以下非常受欢迎的要求:
保罗怀特也是对的,在下面的表达式中,'AND NOT'部分在补偿三值逻辑方面不常见:
((x = y) AND NOT (x IS NULL OR y IS NULL)) OR (x IS NULL AND y IS NULL)
在一般情况下,该额外条件是必要的,以防止整个表达式的结果为NULL,例如假设x = 1且y = NULL.然后
(x = y) --> NULL 
(x IS NULL AND y IS NULL) --> false
NULL OR false --> NULL
如果比较表达式在查询表达式的组合中稍后被否定,则NULL和false之间的区别很重要,例如:
NOT (false) --> true 
NOT (NULL) --> NULL
确实,我们可以将智能添加到EF以确定何时不需要这个额外的术语(例如,如果我们知道表达式在查询的谓词中没有被否定)并且从查询中优化它.
顺便说一句,我们在codeplex中的以下EF错误中跟踪此问题:
[Performance]在C#null比较语义的情况下,减少复杂查询的表达式树