Gol*_*tio 18 sql t-sql sql-server greatest-n-per-group
我有一个包含客户信息的表.为每个客户分配一个客户ID(他们的SSN),他们在打开更多帐户时保留这些ID.两个客户可能在同一个帐户中,每个客户都有自己的ID.帐号不按日期排序.
我想找到每个客户或客户群的最新帐户.如果两个客户曾经在一个帐户上,我想要返回客户所在的最新帐户.
这是一个包含一些可能情况的示例表.
示例表ACCT:
acctnumber date Cust1ID Cust2ID
10000 '2016-02-01' 1110 NULL --Case0-customer has only ever had
--one account
10001 '2016-02-01' 1111 NULL --Case1-one customer has multiple
10050 '2017-02-01' 1111 NULL --accounts
400050 '2017-06-01' 1111 NULL
10089 '2017-12-08' 1111 NULL
10008 '2016-02-01' 1120 NULL --Case2-customer has account(s) and later
10038 '2016-04-01' 1120 NULL
10058 '2017-02-03' 1120 1121 --gets account(s) with another customer
10002 '2016-02-01' 1112 NULL --Case3-customer has account(s) and later
10052 '2017-02-02' 1113 1112 --becomes the second customer on another
10152 '2017-05-02' 1113 1112 --account(s)
10003 '2016-02-02' 1114 1115 --Case4-customer and second customer
7060 '2017-02-04' 1115 1114 --switch which is first and second
10004 '2016-02-02' 1116 1117 --Case5-second customer later gets
10067 '2017-02-05' 1117 NULL --separate account(s)
10167 '2018-02-05' 1117 NULL
50013 '2016-01-01' 2008 NULL --Case5b -customer has account(s) & later
50014 '2017-02-02' 2008 2009 --gets account(s) with second customer &
50015 '2017-04-04' 2008 NULL --later still first customer gets
100015 '2018-05-05' 2008 NULL --separate account(s)
30005 '2015-02-01' 1118 NULL --Case6-customer has account(s)
10005 '2016-02-01' 1118 NULL
10054 '2017-02-02' 1118 1119 --gets account(s) with another
40055 '2017-03-03' 1118 1119
10101 '2017-04-04' 1119 NULL --who later gets separate account(s)
10201 '2017-05-05' 1119 NULL
30301 '2017-06-06' 1119 NULL
10322 '2018-01-01' 1119 NULL
10007 '2016-02-01' 1122 1123 --Case7-customers play musical chairs
10057 '2017-02-03' 1123 1124
10107 '2017-06-02' 1124 1125
50001 '2016-01-01' 2001 NULL --Case8a-customers with account(s)
50002 '2017-02-02' 2001 2002 --together each later get separate
50003 '2017-03-03' 2001 NULL --account(s)
50004 '2017-04-04' 2002 NULL
50005 '2016-01-01' 2003 NULL --Case8b-customers with account(s)
50006 '2017-02-02' 2003 2004 --together each later get separate
50007 '2017-03-03' 2004 NULL --account(s)
50008 '2017-04-04' 2003 NULL
50017 '2018-03-03' 2004 NULL
50018 '2018-04-04' 2003 NULL
50009 '2016-01-01' 2005 NULL --Case9a-customer has account(s) & later
50010 '2017-02-02' 2005 2006 --gets account(s) with a second customer
50011 '2017-03-03' 2005 2007 --& later still gets account(s) with a
--third customer
50109 '2016-01-01' 2015 NULL --Case9b starts the same as Case9a, but
50110 '2017-02-02' 2015 2016
50111 '2017-03-03' 2015 2017
50112 '2017-04-04' 2015 NULL --after all accounts with other customers
50122 '2017-05-05' 2015 NULL --are complete, the original primary
--customer begins opening individual
--accounts again
Run Code Online (Sandbox Code Playgroud)
期望的结果:
acctnumber date Cust1ID Cust2ID
10000 '2016-02-01' 1110 NULL --Case0
10089 '2017-12-08' 1111 NULL --Case1
10058 '2017-02-03' 1120 1121 --Case2
10152 '2017-05-02' 1113 1112 --Case3
7060 '2017-02-04' 1115 1114 --Case4
10167 '2018-02-05' 1117 NULL --Case5
100015 '2018-05-05' 2008 NULL --Case5b
10322 '2018-01-01' 1119 NULL --Case6
10107 '2017-06-02' 1124 1125 --Case7
50003 '2017-03-03' 2001 NULL --Case8a result 1
50004 '2017-04-04' 2002 NULL --Case8a result 2
50017 '2018-03-03' 2004 NULL --Case8b result 1
50018 '2018-04-04' 2003 NULL --Case8b result 2
50011 '2017-03-03' 2005 2007 --Case9a
50122 '2017-05-05' 2015 NULL --Case9b
Run Code Online (Sandbox Code Playgroud)
或者,我会接受案例7输出两个独立的客户群:
10007 '2016-02-01' 1122 1123 --Case7 result 1
10107 '2017-06-02' 1124 1125 --Case7 result 2
Run Code Online (Sandbox Code Playgroud)
因为案例8a和8b代表公司承认客户值得拥有单独的帐户,我们会想要将他们的组视为拆分,因此它具有单独的结果集.
此外,在大多数情况下,客户有很多帐户,并且混合和匹配上述情况加班很常见.例如,单个客户可以有五个帐户(案例1),然后稍后与另一个客户打开一个或多个帐户(案例3)有时会切换主帐户持有人(案例4),然后第一个客户再次开始打开个人帐户(案例5b).
每当acctnumbers唯一且任何Cust ID匹配时,我都尝试将表连接到自身的副本.但是,这会删除只有一个帐户的客户,因此我添加了一个由custid在custid或帐号和组上没有匹配的cust联合.
不幸的是,第二部分不仅包括来自案例0的custids,还有一些custids被排除在一起,不应该是.
select
max(date1) as date,
cust1id1 as cust1id
from
(
select
acctnumber as [acctnumber1],
date as [date1],
cust1id as [cust1id1],
cust2id as [cust2id1]
from
acct
) t1
join
(
select
acctnumber as [acctnumber2],
date as [date2],
cust1id as [cust1id2],
cust2id as [cust2id2]
from
acct
) t2
on t1.date1 > t2.date2 and
(t1.cust1id1 = t2.cust1id2 or
t1.cust1id1 = t2.cust2id2 or
t1.cust2id1 = t2.cust2id2)
Group by
cust1id1
union
select
max(date1) as date,
cust1id1 as cust1id
from
(
select
acctnumber as [acctnumber1],
date as [date1],
cust1id as [cust1id1],
cust2id as [cust2id1]
from
acct
) t1
join
(
select
acctnumber as [acctnumber2],
date as [date2],
cust1id as [cust1id2],
cust2id as [cust2id2]
from
acct
) t2
on (t1.acctnumber1 != t2.acctnumber2 and
t1.cust1id1 != t2.cust1id2 and
t1.cust1id1 != t2.cust2id2 and
t1.cust2id1 != t2.cust2id2)
group by
cust1id1
Run Code Online (Sandbox Code Playgroud)
感谢您迄今为止所有出色的答案和评论.我一直在尝试查询并比较结果.
@VladimirBaranov提出了一个罕见的案例,我以前没有在评论中考虑其他答案.
与案例7类似,如果Case8被处理,它将是一个奖励,但不是预期的.
案例9很重要,应该处理9a和9b的结果.
我注意到我原来的7个案例的问题.
在最近的帐户中,当客户不再在帐户上时,它始终是剩下的第二个借款人.这完全是无意的,您可以查看这些示例中的任何一个,并且客户可能是最近帐户中的剩余客户.
此外,每个案例都有最少数量的帐户来准确显示案例测试的内容,但这并不常见.通常在每个案例的每个步骤中,在客户切换到添加第二个客户之前可以有5个,10个,15个或更多个帐户,然后这两个帐户可以一起拥有多个帐户.
回顾我看到的答案很多都有索引,创建,更新和其他特定于能够编辑数据库的子句.不幸的是,我在这个数据库的消费者方面,所以我只读访问,我可以用来与数据库交互的程序自动拒绝它们.
我的回答是错误的,抱歉过早发帖。我正在研究一个不同的想法,我很快就会回来。
原始回复:
假设您的日期格式是 MM.DD.YY,我得到的代码如下所示。我不明白为什么您所需的结果集不包含 CustID 1116 或 1118 的行,但我确实看到包含它们将分别重复 1117 和 1119,除非修改源数据以从其中删除这些重复的 1117 和 1119 值结果。目前,我有这个临时解决方案,等待您的回复。
declare @ACCT table (
acctnumber int,
date date,
Cust1ID int,
Cust2ID int
);
insert into @ACCT values (10000, '2016-02-01', 1110, null);
insert into @ACCT values (10001, '2016-02-01', 1111, null);
insert into @ACCT values (10050, '2017-02-01', 1111, null);
insert into @ACCT values (10008, '2016-02-01', 1120, null);
insert into @ACCT values (10058, '2017-02-03', 1120, 1121);
insert into @ACCT values (10002, '2016-02-01', 1112, null);
insert into @ACCT values (10052, '2017-02-02', 1113, 1112);
insert into @ACCT values (10003, '2016-02-02', 1114, 1115);
insert into @ACCT values (7060, '2017-02-04', 1115, 1114);
insert into @ACCT values (10004, '2016-02-02', 1116, 1117);
insert into @ACCT values (10067, '2017-02-05', 1117, null);
insert into @ACCT values (10005, '2016-02-01', 1118, null);
insert into @ACCT values (10054, '2017-02-03', 1118, 1119);
insert into @ACCT values (10101, '2017-06-02', 1119, null);
insert into @ACCT values (10007, '2016-02-01', 1122, 1123);
insert into @ACCT values (10057, '2017-02-03', 1123, 1124);
insert into @ACCT values (10107, '2017-06-02', 1124, 1125);
with
OneCustId as (
select
acctnumber,[date], Cust1ID as CustID
from
@ACCT
union
select
acctnumber, [date], Cust2ID
from
@ACCT
),
SortedByLastUsage as (
select
acctnumber, [date], CustID, row_number() over (partition by CustID order by [date] desc) as RowID
from
OneCustId
),
LastUsage as (
select
acctnumber, [date], CustID
from
SortedByLastUsage
where
RowID = 1
)
select distinct
ACCT.acctnumber, ACCT.[date], ACCT.Cust1ID, ACCT.Cust2ID
from
@ACCT ACCT
inner join LastUsage on
ACCT.acctnumber = LastUsage.acctnumber and
ACCT.[date] = LastUsage.[date] and
LastUsage.CustID in (ACCT.Cust1ID, ACCT.Cust2ID)
order by
Cust1ID, Cust2ID
Run Code Online (Sandbox Code Playgroud)
结果集:
acctnumber date Cust1ID Cust2ID
10000 2016-02-01 1110 NULL
10050 2017-02-01 1111 NULL
10052 2017-02-02 1113 1112
7060 2017-02-04 1115 1114
10004 2016-02-02 1116 1117
10067 2017-02-05 1117 NULL
10054 2017-02-03 1118 1119
10101 2017-06-02 1119 NULL
10058 2017-02-03 1120 1121
10007 2016-02-01 1122 1123
10057 2017-02-03 1123 1124
10107 2017-06-02 1124 1125
Run Code Online (Sandbox Code Playgroud)