如何在多对多映射表中有效地创建逻辑数据子集?

Sam*_*ham 6 t-sql many-to-many aggregate-functions sql-server-2008

我在发票和信用卡交易之间存在多对多的关系,我正试图将这些交易映射到一起.思考问题的最佳方法是将TransactionInvoiceMap想象为二分图.对于每个连接的子图,找到所有发票的总和以及该子图中所有交易的总和.在我的查询中,我想返回为每个子图计算的值以及它们与之关联的事务ID.相关交易的总数应该相同.

更明确地说,给出以下交易/发票

Table: TransactionInvoiceMap
TransactionID  InvoiceID
1              1
2              2
3              2
3              3

Table: Transactions
TransactionID  Amount
1              $100
2              $75
3              $75

Table: Invoices
InvoiceID  Amount
1          $100
2          $100
3          $50
Run Code Online (Sandbox Code Playgroud)

我想要的输出是

TransactionID  TotalAsscTransactions TotalAsscInvoiced
1              $100                  $100
2              $150                  $150
3              $150                  $150
Run Code Online (Sandbox Code Playgroud)

请注意,发票2和3以及事务2和3是逻辑组的一部分.

这是一个显然有效的解决方案(简化,名称已更改),但速度非常慢.我很难弄清楚如何优化它,但我认为这将涉及将子查询消除到TransactionInvoiceGrouping中.随意提出完全不同的东西.

with TransactionInvoiceGrouping as (
    select 
        -- Need an identifier for each logical group of transactions/invoices, use
        -- one of the transaction ids for this.
        m.TransactionID,
        m.InvoiceID,
        min(m.TransactionID) over (partition by m.InvoiceID) as GroupingID
    from TransactionInvoiceMap m
)
select distinct
    g.TransactionID,
    istat.InvoiceSum as TotalAsscInvoiced,
    tstat.TransactionSum as TotalAsscTransactions
from TransactionInvoiceGrouping g
    cross apply (
        select sum(ii.Amount) as InvoiceSum
        from (select distinct InvoiceID, GroupingID from TransactionInvoiceGrouping) ig
            inner join Invoices ii on ig.InvoiceID = ii.InvoiceID
        where ig.GroupingID = g.GroupingID
    ) as istat
    cross apply (
        select sum(it.Amount) as TransactionSum
        from (select distinct TransactionID, GroupingID from TransactionInvoiceGrouping) ig
            left join Transactions it on ig.TransactionID = it.TransactionID
        where ig.GroupingID = g.GroupingID
        having sum(it.Amount) > 0
    ) as tstat
Run Code Online (Sandbox Code Playgroud)

Tim*_*ner 2

我已经在递归 CTE中实现了该解决方案:

;with TranGroup as (
    select TransactionID
        , InvoiceID as NextInvoice
        , TransactionID as RelatedTransaction
        , cast(TransactionID as varchar(8000)) as TransactionChain
    from TransactionInvoiceMap
    union all
    select g.TransactionID
        , m1.InvoiceID
        , m.TransactionID
        , g.TransactionChain + ',' + cast(m.TransactionID as varchar(11))
    from TranGroup g
        join TransactionInvoiceMap m on g.NextInvoice = m.InvoiceID
        join TransactionInvoiceMap m1 on m.TransactionID = m1.TransactionID
    where ',' + g.TransactionChain + ',' not like '%,' + cast(m.TransactionID as varchar(11)) + ',%'
)
, RelatedTrans as (
    select distinct TransactionID, RelatedTransaction
    from TranGroup
)
, RelatedInv as (
    select distinct TransactionID, NextInvoice as RelatedInvoice
    from TranGroup
)
select TransactionID
    , (
        select sum(Amount)
        from Transactions
        where TransactionID in (
            select RelatedTransaction
            from RelatedTrans
            where TransactionID = t.TransactionID
        )
    ) as TotalAsscTransactions
    , (
        select sum(Amount)
        from Invoices
        where InvoiceID in (
            select RelatedInvoice
            from RelatedInv
            where TransactionID = t.TransactionID
        )
    ) as TotalAsscInvoiced
from Transactions t
Run Code Online (Sandbox Code Playgroud)

可能还有一些优化的空间(包括我的对象命名!),但我相信我至少有一个正确的解决方案,它将收集所有可能的交易-发票关系以包含在计算中。

我无法获得此页面上的现有解决方案来提供OP所需的输出,并且当我添加更多测试数据时它们变得更难看。我不确定OP发布的“慢”解决方案是否正确。我很可能误解了这个问题。

附加信息:

我经常看到,在处理大量数据时,递归查询可能会很慢。也许这可能是另一个 SO 问题的主题。如果是这种情况,在 SQL 方面要尝试的事情可能是限制范围(添加where子句)、索引基表、首先将 CTE 选择到临时表中、索引该临时表、为 CTE 考虑更好的停止条件...当然,首先要介绍个人资料。