SQL Sever使用"Group By ... With Cube"获得独特计数

Joh*_*ock 5 sql t-sql sql-server group-by sql-server-2000

基本上我试图在这个立方结果中得到一个明显的计数.但不幸的是你不能使用Count(distinct(Field))与cube和rollup(如此处所述)

这是数据的样子.(这只是一个简单的例子,我确实希望在数据中重复)

    Category1       Category2       ItemId
    a               b               1
    a               b               1
    a               a               1
    a               a               2
    a               c               1
    a               b               2
    a               b               3
    a               c               2
    a               a               1
    a               a               3
    a               c               4
Run Code Online (Sandbox Code Playgroud)

这是我想做的,但它不起作用.

SELECT
  Category1,
  Category2,
  Count(Distinct(ItemId))
FROM ItemList IL
GROUP BY
  Category1,
  Category2
WITH CUBE
Run Code Online (Sandbox Code Playgroud)

我知道我可以像这样做一个子选择来得到我想要的结果:

SELECT
  *,
  (SELECT
     Count(Distinct(ItemId)) 
   FROM ItemList IL2 
   WHERE 
     (Q1.Category1 IS NULL OR Q1.Category1 IS NOT NULL AND Q1.Category1 = IL2.Category1) 
     AND
     (Q1.Category2 IS NULL OR Q1.Category2 IS NOT NULL AND Q1.Category2 = IL2.Category2))
       AS DistinctCountOfItems 
FROM (SELECT
        Category1,
        Category2
      FROM ItemList IL
      GROUP BY
        Category1,
        Category2
      WITH CUBE) Q1
Run Code Online (Sandbox Code Playgroud)

但是由于子选择,当结果集很大时,这会运行得很慢.有没有其他方法可以从立方结果中获得一个独特的计数?

这是我想看到的结果

Category1     Category2    DistinctCountOfItems
a             a            3
a             b            3
a             c            3
a             NULL         4
NULL          NULL         4
NULL          a            3
NULL          b            3
NULL          c            3
Run Code Online (Sandbox Code Playgroud)

che*_*525 6

你应该能够像这样清理你的"凌乱"答案:

select Category1, Category2, count(distinct ItemId)
from ItemList
group by Category1, Category2
UNION ALL
select Category1, null, count(distinct ItemId)
from ItemList
group by Category1
UNION ALL
select null, Category2, count(distinct ItemId)
from ItemList
group by Category2
UNION ALL
select null, null, count(distinct ItemId)
from ItemList
Run Code Online (Sandbox Code Playgroud)

然后我想出了另一个选项:

select IL1.Category1, IL1.Category2, count(distinct ItemId)
from ( 
  select Category1, Category2
  from ItemList
  group by Category1, Category2
  with cube
 ) IL1
 join ItemList IL2 on (IL1.Category1=IL2.Category1 and IL1.Category2=IL2.Category2)
      or (IL1.Category1 is null and IL1.Category2=IL2.Category2)
      or (IL1.Category2 is null and IL1.Category1=IL2.Category1)
      or (IL1.Category1 is null and IL1.Category2 is null)
group by IL1.Category1, IL1.Category2
Run Code Online (Sandbox Code Playgroud)

效率可能因索引,分组列数等而异.对于我写的测试表,子选择和连接(与Unions相对)稍好一些.我目前无法访问MSSQL 2000实例(我在2005实例上测试过),但我认为这里的任何内容都无效.

UPDATE

一个更好的选择,特别是如果你对超过2列进行分组(如果你在8列上进行分组,上面的代码将需要256个join子句来捕获所有空值组合!):

select IL1.Category1, IL1.Category2, count(distinct ItemId)
from ( 
  select Category1, Category2
  from ItemList
  group by Category1, Category2
  with cube
 ) IL1
 inner join ItemList IL2 on isnull(IL1.Category1,IL2.Category1)=IL2.Category1
                  and isnull(IL1.Category2,IL2.Category2)=IL2.Category2
group by IL1.Category1, IL1.Category2
Run Code Online (Sandbox Code Playgroud)