查找每个客户组的最新帐户

Gol*_*tio 18 sql t-sql sql-server greatest-n-per-group

我有一个包含客户信息的表.为每个客户分配一个客户ID(他们的SSN),他们在打开更多帐户时保留这些ID.两个客户可能在同一个帐户中,每个客户都有自己的ID.帐号不按日期排序.

我想找到每个客户或客户群的最新帐户.如果两个客户曾经在一个帐户上,我想要返回客户所在的最新帐户.

这是一个包含一些可能情况的示例表.

示例表ACCT:

acctnumber  date            Cust1ID     Cust2ID 
10000       '2016-02-01'    1110        NULL    --Case0-customer has only ever had
                                                --one account

10001       '2016-02-01'    1111        NULL    --Case1-one customer has multiple
10050       '2017-02-01'    1111        NULL    --accounts
400050      '2017-06-01'    1111        NULL
10089       '2017-12-08'    1111        NULL

10008       '2016-02-01'    1120        NULL    --Case2-customer has account(s) and later
10038       '2016-04-01'    1120        NULL
10058       '2017-02-03'    1120        1121    --gets account(s) with another customer

10002       '2016-02-01'    1112        NULL    --Case3-customer has account(s) and later
10052       '2017-02-02'    1113        1112    --becomes the second customer on another
10152       '2017-05-02'    1113        1112    --account(s)

10003       '2016-02-02'    1114        1115    --Case4-customer and second customer
7060        '2017-02-04'    1115        1114    --switch which is first and second

10004       '2016-02-02'    1116        1117    --Case5-second customer later gets
10067       '2017-02-05'    1117        NULL    --separate account(s)
10167       '2018-02-05'    1117        NULL

50013       '2016-01-01'    2008        NULL    --Case5b -customer has account(s) & later
50014       '2017-02-02'    2008        2009    --gets account(s) with second customer &
50015       '2017-04-04'    2008        NULL    --later still first customer gets
100015      '2018-05-05'    2008        NULL    --separate account(s)

30005       '2015-02-01'    1118        NULL    --Case6-customer has account(s) 
10005       '2016-02-01'    1118        NULL
10054       '2017-02-02'    1118        1119    --gets account(s) with another
40055       '2017-03-03'    1118        1119
10101       '2017-04-04'    1119        NULL    --who later gets separate account(s)
10201       '2017-05-05'    1119        NULL
30301       '2017-06-06'    1119        NULL
10322       '2018-01-01'    1119        NULL

10007       '2016-02-01'    1122        1123    --Case7-customers play musical chairs
10057       '2017-02-03'    1123        1124
10107       '2017-06-02'    1124        1125

50001       '2016-01-01'    2001        NULL    --Case8a-customers with account(s)
50002       '2017-02-02'    2001        2002    --together each later get separate
50003       '2017-03-03'    2001        NULL    --account(s)
50004       '2017-04-04'    2002        NULL

50005       '2016-01-01'    2003        NULL    --Case8b-customers with account(s)
50006       '2017-02-02'    2003        2004    --together each later get separate
50007       '2017-03-03'    2004        NULL    --account(s)
50008       '2017-04-04'    2003        NULL
50017       '2018-03-03'    2004        NULL
50018       '2018-04-04'    2003        NULL

50009       '2016-01-01'    2005        NULL    --Case9a-customer has account(s) & later
50010       '2017-02-02'    2005        2006    --gets account(s) with a second customer
50011       '2017-03-03'    2005        2007    --& later still gets account(s) with a
                                                --third customer

50109       '2016-01-01'    2015        NULL    --Case9b starts the same as Case9a, but
50110       '2017-02-02'    2015        2016    
50111       '2017-03-03'    2015        2017    
50112       '2017-04-04'    2015        NULL    --after all accounts with other customers
50122       '2017-05-05'    2015        NULL    --are complete, the original primary
                                                --customer begins opening individual
                                                --accounts again
Run Code Online (Sandbox Code Playgroud)

期望的结果:

acctnumber  date            Cust1ID     Cust2ID 
10000       '2016-02-01'    1110        NULL    --Case0    
10089       '2017-12-08'    1111        NULL    --Case1
10058       '2017-02-03'    1120        1121    --Case2
10152       '2017-05-02'    1113        1112    --Case3
7060        '2017-02-04'    1115        1114    --Case4
10167       '2018-02-05'    1117        NULL    --Case5
100015      '2018-05-05'    2008        NULL    --Case5b
10322       '2018-01-01'    1119        NULL    --Case6
10107       '2017-06-02'    1124        1125    --Case7
50003       '2017-03-03'    2001        NULL    --Case8a result 1
50004       '2017-04-04'    2002        NULL    --Case8a result 2
50017       '2018-03-03'    2004        NULL    --Case8b result 1
50018       '2018-04-04'    2003        NULL    --Case8b result 2
50011       '2017-03-03'    2005        2007    --Case9a
50122       '2017-05-05'    2015        NULL    --Case9b
Run Code Online (Sandbox Code Playgroud)

或者,我会接受案例7输出两个独立的客户群:

10007       '2016-02-01'    1122        1123    --Case7 result 1
10107       '2017-06-02'    1124        1125    --Case7 result 2
Run Code Online (Sandbox Code Playgroud)

因为案例8a和8b代表公司承认客户值得拥有单独的帐户,我们会想要将他们的组视为拆分,因此它具有单独的结果集.

此外,在大多数情况下,客户有很多帐户,并且混合和匹配上述情况加班很常见.例如,单个客户可以有五个帐户(案例1),然后稍后与另一个客户打开一个或多个帐户(案例3)有时会切换主帐户持有人(案例4),然后第一个客户再次开始打开个人帐户(案例5b).


每当acctnumbers唯一且任何Cust ID匹配时,我都尝试将表连接到自身的副本.但是,这会删除只有一个帐户的客户,因此我添加了一个由custid在custid或帐号和组上没有匹配的cust联合.

不幸的是,第二部分不仅包括来自案例0的custids,还有一些custids被排除在一起,不应该是.

select
    max(date1) as date,
    cust1id1 as cust1id
from
(
select
    acctnumber as [acctnumber1],
    date as [date1],
    cust1id as [cust1id1],
    cust2id as [cust2id1]
from 
    acct
) t1
join
(
select
    acctnumber as [acctnumber2],
    date as [date2],
    cust1id as [cust1id2],
    cust2id as [cust2id2]
from 
    acct
) t2
on t1.date1 > t2.date2 and
(t1.cust1id1 = t2.cust1id2 or
t1.cust1id1 = t2.cust2id2 or
t1.cust2id1 = t2.cust2id2)
Group by
cust1id1
union
select
    max(date1) as date,
    cust1id1 as cust1id
from
(
select
    acctnumber as [acctnumber1],
    date as [date1],
    cust1id as [cust1id1],
    cust2id as [cust2id1]
from 
    acct
) t1
join
(
select
    acctnumber as [acctnumber2],
    date as [date2],
    cust1id as [cust1id2],
    cust2id as [cust2id2]
from 
    acct
) t2
on (t1.acctnumber1 != t2.acctnumber2 and
t1.cust1id1 != t2.cust1id2 and
t1.cust1id1 != t2.cust2id2 and
t1.cust2id1 != t2.cust2id2)
group by
cust1id1
Run Code Online (Sandbox Code Playgroud)

更新

感谢您迄今为止所有出色的答案和评论.我一直在尝试查询并比较结果.

@VladimirBaranov提出了一个罕见的案例,我以前没有在评论中考虑其他答案.

与案例7类似,如果Case8被处理,它将是一个奖励,但不是预期的.

案例9很重要,应该处理9a和9b的结果.

更新2

我注意到我原来的7个案例的问题.

在最近的帐户中,当客户不再在帐户上时,它始终是剩下的第二个借款人.这完全是无意的,您可以查看这些示例中的任何一个,并且客户可能是最近帐户中的剩余客户.

此外,每个案例都有最少数量的帐户来准确显示案例测试的内容,但这并不常见.通常在每个案例的每个步骤中,在客户切换到添加第二个客户之前可以有5个,10个,15个或更多个帐户,然后这两个帐户可以一起拥有多个帐户.

回顾我看到的答案很多都有索引,创建,更新和其他特定于能够编辑数据库的子句.不幸的是,我在这个数据库的消费者方面,所以我只读访问,我可以用来与数据库交互的程序自动拒绝它们.

Jef*_*ner 0

我的回答是错误的,抱歉过早发帖。我正在研究一个不同的想法,我很快就会回来。


原始回复:

假设您的日期格式是 MM.DD.YY,我得到的代码如下所示。我不明白为什么您所需的结果集不包含 CustID 1116 或 1118 的行,但我确实看到包含它们将分别重复 1117 和 1119,除非修改源数据以从其中删除这些重复的 1117 和 1119 值结果。目前,我有这个临时解决方案,等待您的回复。

declare @ACCT table (
  acctnumber int,
  date date,
  Cust1ID int,
  Cust2ID int
);

insert into @ACCT values (10000, '2016-02-01', 1110, null);
insert into @ACCT values (10001, '2016-02-01', 1111, null);
insert into @ACCT values (10050, '2017-02-01', 1111, null);
insert into @ACCT values (10008, '2016-02-01', 1120, null);
insert into @ACCT values (10058, '2017-02-03', 1120, 1121);
insert into @ACCT values (10002, '2016-02-01', 1112, null);
insert into @ACCT values (10052, '2017-02-02', 1113, 1112);
insert into @ACCT values (10003, '2016-02-02', 1114, 1115);
insert into @ACCT values (7060,  '2017-02-04', 1115, 1114);
insert into @ACCT values (10004, '2016-02-02', 1116, 1117);
insert into @ACCT values (10067, '2017-02-05', 1117, null);
insert into @ACCT values (10005, '2016-02-01', 1118, null);
insert into @ACCT values (10054, '2017-02-03', 1118, 1119);
insert into @ACCT values (10101, '2017-06-02', 1119, null);
insert into @ACCT values (10007, '2016-02-01', 1122, 1123);
insert into @ACCT values (10057, '2017-02-03', 1123, 1124);
insert into @ACCT values (10107, '2017-06-02', 1124, 1125);

with

OneCustId as (
select
  acctnumber,[date], Cust1ID as CustID
from
  @ACCT

union

select
  acctnumber, [date], Cust2ID
from
  @ACCT
),

SortedByLastUsage as (
select
  acctnumber, [date], CustID, row_number() over (partition by CustID order by [date] desc) as RowID
from
  OneCustId
),

LastUsage as (
select
  acctnumber, [date], CustID
from
  SortedByLastUsage
where
  RowID = 1
)

select distinct
  ACCT.acctnumber, ACCT.[date], ACCT.Cust1ID, ACCT.Cust2ID
from
  @ACCT ACCT
  inner join LastUsage on
    ACCT.acctnumber = LastUsage.acctnumber and
    ACCT.[date] = LastUsage.[date] and
    LastUsage.CustID in (ACCT.Cust1ID, ACCT.Cust2ID)
order by
  Cust1ID, Cust2ID
Run Code Online (Sandbox Code Playgroud)

结果集:

acctnumber  date    Cust1ID Cust2ID
10000   2016-02-01  1110    NULL
10050   2017-02-01  1111    NULL
10052   2017-02-02  1113    1112
7060    2017-02-04  1115    1114
10004   2016-02-02  1116    1117
10067   2017-02-05  1117    NULL
10054   2017-02-03  1118    1119
10101   2017-06-02  1119    NULL
10058   2017-02-03  1120    1121
10007   2016-02-01  1122    1123
10057   2017-02-03  1123    1124
10107   2017-06-02  1124    1125
Run Code Online (Sandbox Code Playgroud)