COUNT(DISTINCT column_name)SQL Server 2008中的差异与COUNT(column_name)?

Ray*_*Ray 12 sql t-sql sql-server sql-server-2008

我遇到了一个让我疯狂的问题.当运行下面的查询时,我得到233,769的计数

 SELECT COUNT(distinct  Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)  
     ON Member_List_Link.UserID = MasterMembers.UserID   
  WHERE MasterMembers.Active = 1 And
        Member_List_Link.GroupID = 5 AND 
        MasterMembers.ValidUsers = 1 AND 
        Member_List_Link.Status = 1
Run Code Online (Sandbox Code Playgroud)

但是如果我在没有 distinct关键字的情况下运行相同的查询,我的计数为233,748

 SELECT COUNT(Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)
   ON Member_List_Link.UserID = MasterMembers.UserID   
 WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5 
  AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1
Run Code Online (Sandbox Code Playgroud)

为了测试,我重新创建了所有表并将它们放入临时表并再次运行查询:

  SELECT COUNT(distinct  #Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1
Run Code Online (Sandbox Code Playgroud)

没有distinct关键字

  SELECT COUNT(#Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1
Run Code Online (Sandbox Code Playgroud)

在旁注中,我通过简单地运行(select * from Member_List_Link into #temp...)重新创建临时表

现在,当我检查COUNT(列)与COUNT(不同列)与这些临时表之间的区别时,我看不到任何!

那么为什么原始表格存在差异?

我正在运行SQL Server 2008(开发版).

更新 - 包括统计资料

PhysicalOp列仅用于第一个查询(无明显)

NULL
Compute Scalar
Stream Aggregate
Clustered Index Seek
Run Code Online (Sandbox Code Playgroud)

PhysicalOp列仅用于第一个查询(具有不同的)

NULL
Compute Scalar
Stream Aggregate
Parallelism
Stream Aggregate
Hash Match
Hash Match
Bitmap
Parallelism
Index Seek
Parallelism
Clustered Index Scan
Run Code Online (Sandbox Code Playgroud)

第一个查询的行和执行(无明显)

1   1
0   0
1   1
1   1
Run Code Online (Sandbox Code Playgroud)

第二个查询的行和执行(具有不同的)

Rows    Executes
1   1
0   0
1   1
16  1
16  16
233767  16
233767  16
281901  16
281901  16
281901  16
234787  16
234787  16
Run Code Online (Sandbox Code Playgroud)

将OPTION(MAXDOP 1)添加到第二个查询(具有不同的)

Rows Executes

1           1
0           0
1           1
233767          1
233767          1
281901          1
548396          1
Run Code Online (Sandbox Code Playgroud)

并由此产生的PhysicalOp

NULL
Compute Scalar
Stream Aggregate
Hash Match
Hash Match
Index Seek
Clustered Index Scan
Run Code Online (Sandbox Code Playgroud)

Mar*_*iss 0

你得到什么结果

SELECT count(*) FROM (
    SELECT distinct  Member_List_Link.UserID
    FROM Member_List_Link  with (nolock)
    INNER JOIN MasterMembers with (nolock)
      ON Member_List_Link.UserID = MasterMembers.UserID
    WHERE MasterMembers.Active = 1 And
         Member_List_Link.GroupID = 5 AND 
         MasterMembers.ValidUsers = 1 AND
         Member_List_Link.Status = 1
) as m
Run Code Online (Sandbox Code Playgroud)

与:

SELECT count(*) FROM (
    SELECT distinct  Member_List_Link.UserID
    FROM Member_List_Link  
    INNER JOIN MasterMembers
      ON Member_List_Link.UserID = MasterMembers.UserID
    WHERE MasterMembers.Active = 1 And
         Member_List_Link.GroupID = 5 AND 
         MasterMembers.ValidUsers = 1 AND
         Member_List_Link.Status = 1
) as m
Run Code Online (Sandbox Code Playgroud)