带有OR条件的GROUP BY之类的SQL查询

9 t-sql sql-server recursion sql-server-2005

我将尝试描述真实情况.在我们公司,我们有一个带桌子的预订系统,我们称之为客户,每个收到的订单都保存了电子邮件和电话联系人 - 这是我无法改变的系统的一部分.我正面临着如何获得独特客户数量的问题.对于独特的客户,我指的是拥有相同电子邮件或相同电话号码的一群人.

例1:从现实生活中你可以想象已经结婚的汤姆和桑德拉.订购4种产品的汤姆在我们的预订系统中填写了3个不同的电子邮件地址和2个不同的电话号码,当其中一个与Sandra(作为家庭电话)共享时,我可以假设它们以某种方式连接.Sandra除了这个共享的电话号码也填写了她的私人电话号码,并且对于这两个订单,她只使用了一个电子邮件地址.对我来说,这意味着将以下所有行计为一个唯一客户.事实上,这个独特的客户可能会成长为整个家庭.

ID   E-mail              Phone          Comment
---- ------------------- -------------- ------------------------------
0    tom@email.com       +44 111 111    First row
1    tommy@email.com     +44 111 111    Same phone, different e-mail
2    thomas@email.com    +44 111 111    Same phone, different e-mail
3    thomas@email.com    +44 222 222    Same e-mail, different phone
4    sandra@email.com    +44 222 222    Same phone, different e-mail
5    sandra@email.com    +44 333 333    Same e-mail, different phone
Run Code Online (Sandbox Code Playgroud)

正如ypercube所说,我可能需要递归来计算所有这些独特的客户.

示例2:以下是我想要做的示例.

是否有可能在不使用递归的情况下获取唯一客户的数量,例如使用游标或其他东西,或者是否需要递归?

ID   E-mail              Phone          Comment
---- ------------------- -------------- ------------------------------
0    linsey@email.com    +44 111 111    ??
1    louise@email.com    +44 111 111     ?? 1. unique customer
2    louise@email.com    +44 222 222    ??
---- ------------------- -------------- ------------------------------
3    steven@email.com    +44 333 333    ??
4    steven@email.com    +44 444 444     ?? 2. unique customer
5    sandra@email.com    +44 444 444    ??
---- ------------------- -------------- ------------------------------
6    george@email.com    +44 555 555    ??? 3. unique customer
---- ------------------- -------------- ------------------------------
7    xavier@email.com    +44 666 666    ??
8    xavier@email.com    +44 777 777     ?? 4. unique customer
9    xavier@email.com    +44 888 888    ??
---- ------------------- -------------- ------------------------------
10   robert@email.com    +44 999 999    ??
11   miriam@email.com    +44 999 999     ?? 5. unique customer
12   sherry@email.com    +44 999 999    ??
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result                                  ? = 5 unique customers
----------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

我已尝试使用GROUP BY进行查询,但我不知道如何按第一列或第二列对结果进行分组.我在寻找让我们说的话

SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone
Run Code Online (Sandbox Code Playgroud)

再次感谢您的任何建议

PS在完整的改写之前,我真的很感激这个问题的答案.现在这里的答案可能与更新不符,所以如果你打算这样做,请不要在这里做任何事情(当然问题除外:).我完全重写了这篇文章.

谢谢,抱歉我的错误开始.

Ant*_*ull 1

这是使用递归 CTE 的完整解决方案。

;WITH Nodes AS
(
    SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
        , [ID]
    FROM
    (
        SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
        FROM dbo.Customer
        UNION ALL
        SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
        FROM dbo.Customer
    ) A
),
Links AS
(
    SELECT DISTINCT A.Id, B.Id LinkedId
    FROM Nodes A
    JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
    SELECT DISTINCT Id, Id LinkedId
    FROM dbo.Customer

    UNION ALL

    SELECT DISTINCT Id, LinkedId
    FROM Links

    UNION ALL

    SELECT A.Id, B.LinkedId
    FROM Links A
    JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
    SELECT Id, Id LinkedId
    FROM Links

    UNION

    SELECT LinkedId Id, LinkedId
    FROM Links

    UNION

    SELECT Id, LinkedId
    FROM Routes
),
UniqueCustomers AS
(
    SELECT Id, MIN(LinkedId) UniqueCustomerId
    FROM TransitiveClosure
    GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id
Run Code Online (Sandbox Code Playgroud)