我可以通过编程方式执行此操作,但正在寻找更清晰的解决方案.
假设我有下表:
First Name Last Name
Smith Albert
Smith Alphonse
Smith Jason
Johnson Charles
Roberts Chris
Roberts Christian
Run Code Online (Sandbox Code Playgroud)
我想用以下规则创建一个独特的
对于艾伯特史密斯,我会回到Alb.Smith
对于查尔斯约翰逊,我会回归约翰逊
为克里斯蒂安罗伯茨我会回归基督.罗伯茨
有没有人对如何在Oracle SQL语句中直接完成此任务有任何想法,还是应该坚持在程序中执行此操作?
具有递归子查询重构(CTE)的版本,需要11gR2:
with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
select last_name, first_name,
row_number() over (order by last_name, first_name),
cast (null as varchar2(20)), 0, length(first_name)
from t42
union all
select last_name, first_name, orig_rn,
part || substr(first_name, part_length + 1, 1),
part_length + 1,
remaining - 1
from t
where remaining > 0
),
u as (
select last_name, first_name, orig_rn, part, part_length,
count(distinct orig_rn) over (partition by last_name) as last_name_count,
count(distinct orig_rn) over (partition by last_name, part) as part_count
from t
),
v as (
select last_name, first_name, orig_rn, part, last_name_count,
row_number() over (partition by orig_rn order by part_length) as rn
from u
where (part_count = 1 or part = first_name)
)
select case when last_name_count = 1 then null
when part = first_name then first_name || ' '
else part || '. '
end || last_name as condendsed_name
from v
where rn = 1
order by orig_rn;
Run Code Online (Sandbox Code Playgroud)
这使:
CONDENSED_NAME
----------------------------------------------
Johnson
Chris Roberts
Christ. Roberts
Alb. Smith
Alp. Smith
J. Smith
Run Code Online (Sandbox Code Playgroud)
该tCTE是递归的.它从原始表行开始,并为第一个名称的每个可能收缩生成其他行:
with t (last_name, first_name, orig_rn, part, part_length, remaining) as (
select last_name, first_name,
row_number () over (order by last_name, first_name),
cast (null as varchar2(20)), 0, length(first_name)
from t42
union all
select last_name, first_name, orig_rn,
part || substr(first_name, part_length + 1, 1),
part_length + 1,
remaining - 1
from t
where remaining > 0
)
select last_name, first_name, part
from t
where last_name = 'Johnson'
order by orig_rn, part_length;
LAST_NAME FIRST_NAME PART
-------------------- -------------------- ------------------------
Johnson Charles
Johnson Charles C
Johnson Charles Ch
Johnson Charles Cha
Johnson Charles Char
Johnson Charles Charl
Johnson Charles Charle
Johnson Charles Charles
Run Code Online (Sandbox Code Playgroud)
下一个CTE u(是的,对于名称很抱歉,我没有灵感)比较所有行的值并计算出现次数.任何有计数的东西1都是独一无二的.
...
u as (
select last_name, first_name, orig_rn, part, part_length,
count(distinct orig_rn) over (partition by last_name) as last_name_count,
count(distinct orig_rn) over (partition by last_name, part) as part_count
from t
)
select last_name, first_name, part, last_name_count, part_count
from u
where last_name = 'Roberts'
order by orig_rn, part_length;
LAST_NAME FIRST_NAME PART LAST_NAME_COUNT PART_COUNT
-------------------- -------------------- ------------------------ --------------- ----------
Roberts Chris 2 2
Roberts Chris C 2 2
Roberts Chris Ch 2 2
Roberts Chris Chr 2 2
Roberts Chris Chri 2 2
Roberts Chris Chris 2 2
Roberts Christian 2 2
Roberts Christian C 2 2
Roberts Christian Ch 2 2
Roberts Christian Chr 2 2
Roberts Christian Chri 2 2
Roberts Christian Chris 2 2
Roberts Christian Christ 2 1
Roberts Christian Christi 2 1
Roberts Christian Christia 2 1
Roberts Christian Christian 2 1
Run Code Online (Sandbox Code Playgroud)
第三个CTE v只查看唯一的CTE ,然后根据唯一值的长度对它们进行排序; 因此,对于所有记录中唯一的记录的第一个名称的最短收缩被排名为1.
...
v as (
select last_name, first_name, orig_rn, part, last_name_count,
row_number() over (partition by orig_rn order by part_length) as rn
from u
where (part_count = 1 or part = first_name)
)
select last_name, first_name, part, last_name_count
from v
where rn = 1
order by orig_rn;
LAST_NAME FIRST_NAME PART LAST_NAME_COUNT
-------------------- -------------------- ------------------------ ---------------
Johnson Charles 1
Roberts Chris Chris 2
Roberts Christian Christ 2
Smith Albert Alb 3
Smith Alphonse Alp 3
Smith Jason J 3
Run Code Online (Sandbox Code Playgroud)
然后,最终查询只提取排名的那些1,这是最短的唯一值,并按照您想要的方式格式化它们.
如果两个人的名字完全相同,那么两者都是完整的拼写(演示),这似乎是你想要的评论.
不确定这是否真的有资格作为'清洁',除了它只能击中原始表一次.
| 归档时间: |
|
| 查看次数: |
860 次 |
| 最近记录: |