Eve*_*ien 42 sql t-sql sql-server parsing
如何使用SQL解析fullname字段中的第一个,中间名和最后一个名称?
我需要尝试匹配与全名不直接匹配的名称.我希望能够获取全名字段并将其分为第一,中间和姓氏.
数据不包含任何前缀或后缀.中间名是可选的.数据格式为"First Middle Last".
我对一些实用的解决方案感兴趣,让我90%的方式.如上所述,这是一个复杂的问题,所以我会单独处理特殊情况.
Jos*_*ons 132
这是一个独立的示例,具有易于操作的测试数据.
在此示例中,如果您的名称包含三个以上的部分,则所有"额外"内容都将放入LAST_NAME字段中.对标识为"标题"的特定字符串例外,例如"DR","MRS"和"MR".
如果缺少中间名,那么您只能获得FIRST_NAME和LAST_NAME(MIDDLE_NAME将为NULL).
你可以将它粉碎成一个巨大的嵌套的SUBSTRING blob,但是可读性很难,就像在SQL中执行此操作一样.
编辑 - 处理以下特殊情况:
1 - NAME字段为NULL
2 - NAME字段包含前导/尾随空格
3 - 名称字段在名称中有> 1个连续空格
4 - NAME字段仅包含名字
5 - 为了便于阅读,将最终输出中的原始全名作为单独的列包括在内
6 - 将特定的前缀列表作为单独的"标题"列处理
SELECT
FIRST_NAME.ORIGINAL_INPUT_DATA
,FIRST_NAME.TITLE
,FIRST_NAME.FIRST_NAME
,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
THEN NULL --no more spaces? assume rest is the last name
ELSE SUBSTRING(
FIRST_NAME.REST_OF_NAME
,1
,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1
)
END AS MIDDLE_NAME
,SUBSTRING(
FIRST_NAME.REST_OF_NAME
,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
,LEN(FIRST_NAME.REST_OF_NAME)
) AS LAST_NAME
FROM
(
SELECT
TITLE.TITLE
,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
THEN TITLE.REST_OF_NAME --No space? return the whole thing
ELSE SUBSTRING(
TITLE.REST_OF_NAME
,1
,CHARINDEX(' ',TITLE.REST_OF_NAME)-1
)
END AS FIRST_NAME
,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
THEN NULL --no spaces @ all? then 1st name is all we have
ELSE SUBSTRING(
TITLE.REST_OF_NAME
,CHARINDEX(' ',TITLE.REST_OF_NAME)+1
,LEN(TITLE.REST_OF_NAME)
)
END AS REST_OF_NAME
,TITLE.ORIGINAL_INPUT_DATA
FROM
(
SELECT
--if the first three characters are in this list,
--then pull it as a "title". otherwise return NULL for title.
CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3)))
ELSE NULL
END AS TITLE
--if you change the list, don't forget to change it here, too.
--so much for the DRY prinicple...
,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME))))
ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME))
END AS REST_OF_NAME
,TEST_DATA.ORIGINAL_INPUT_DATA
FROM
(
SELECT
--trim leading & trailing spaces before trying to process
--disallow extra spaces *within* the name
REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),' ',' '),' ',' ') AS FULL_NAME
,FULL_NAME AS ORIGINAL_INPUT_DATA
FROM
(
--if you use this, then replace the following
--block with your actual table
SELECT 'GEORGE W BUSH' AS FULL_NAME
UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME
UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME
UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME
UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME
UNION SELECT 'TOMMY' AS FULL_NAME
UNION SELECT 'BILLY' AS FULL_NAME
UNION SELECT NULL AS FULL_NAME
UNION SELECT ' ' AS FULL_NAME
UNION SELECT ' JOHN JACOB SMITH' AS FULL_NAME
UNION SELECT ' DR SANJAY GUPTA' AS FULL_NAME
UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME
UNION SELECT ' MRS SUSAN ADAMS' AS FULL_NAME
UNION SELECT ' MS AUGUSTA ADA KING ' AS FULL_NAME
) RAW_DATA
) TEST_DATA
) TITLE
) FIRST_NAME
Run Code Online (Sandbox Code Playgroud)
如果不知道"全名"是如何格式化的,很难回答.
它可以是"姓氏,名字中间名"或"名字中间名"等.
基本上你必须使用SUBSTRING功能
SUBSTRING ( expression , start , length )
Run Code Online (Sandbox Code Playgroud)
也许是CHARINDEX功能
CHARINDEX (substr, expression)
Run Code Online (Sandbox Code Playgroud)
计算要提取的每个零件的起点和长度.
所以我们可以说格式是"名字姓"你可以(未经测试......但应该关闭):
SELECT
SUBSTRING(fullname, 1, CHARINDEX(' ', fullname) - 1) AS FirstName,
SUBSTRING(fullname, CHARINDEX(' ', fullname) + 1, len(fullname)) AS LastName
FROM YourTable
Run Code Online (Sandbox Code Playgroud)
反转问题,添加列以保存各个部分并将它们组合以获取全名.
这将是最好的答案的原因是,没有保证的方法来确定一个人已经注册为他们的名字,他们的中间名是什么.
例如,你会如何拆分?
Jan Olav Olsen Heggelien
Run Code Online (Sandbox Code Playgroud)
这虽然是虚构的,但在挪威是一个合法的名称,可以,但不一定要像这样拆分:
First name: Jan Olav
Middle name: Olsen
Last name: Heggelien
Run Code Online (Sandbox Code Playgroud)
或者,像这样:
First name: Jan Olav
Last name: Olsen Heggelien
Run Code Online (Sandbox Code Playgroud)
或者,像这样:
First name: Jan
Middle name: Olav
Last name: Olsen Heggelien
Run Code Online (Sandbox Code Playgroud)
我想在大多数语言中都可以找到类似的出现.
因此,不要试图解释没有足够信息来正确解读的数据,而是存储正确的解释,然后组合起来获取全名.
小智 7
除非你有非常非常好的数据,否则这是一项非常重要的挑战.一种天真的方法是在空格上进行标记化,并假设三个标记结果是[第一个,中间的,最后一个],并且双标记结果是[第一个,最后一个],但是你将不得不处理多个单词姓氏(例如"Van Buren")和多个中间名.
小智 6
另一种简单的方法是使用parsename:
select full_name,
parsename(replace(full_name, ' ', '.'), 3) as FirstName,
parsename(replace(full_name, ' ', '.'), 2) as MiddleName,
parsename(replace(full_name, ' ', '.'), 1) as LastName
from YourTableName
Run Code Online (Sandbox Code Playgroud)