查找具有额外字符的相同字符串的记录

Min*_*ert 4 sql-server full-text-search sql-server-2014 string-searching

好的,所以我有一个 Microsoft SQL Server 2014 数据库表,owner其中包含大约 90,000 条包含所有者信息的记录,另一个vehicle包含车辆信息

Owner_Name                   owner_id       V_name     owner_id    exempt
-------------------------------------       ------------------------------
JACOB JAMISON & JESSICA           35        Civic            35        H3
JACOB JAMISON M & JESSICA B       39        Accord           39        H3 
BLACKSON BARRINGTON               56        Bugatti          56        H6
BLACKSON BARRINGTON H             98        SSC              98        H7
BRUSTER MICHAEL                   107       Corvette         107       H9
Run Code Online (Sandbox Code Playgroud)

我正在尝试查找对车辆具有多个豁免的所有记录(H0意味着没有豁免)。下面的代码运行良好,只要名称完全相同。但是,如果有变化,例如额外的字母或向后输入,则不会返回这些记录。我看过类似的东西SOUNDEX,但这在我的场景中不起作用。

SELECT Owner_name
     , COUNT(Owner_name) AS 'xNameAppears'
     , COUNT(v.exempt) AS 'ExemptionCount' 
FROM owner o
INNER JOIN vehicle V ON V.owner_id = o.owner_id
WHERE v.exempt <> 'H0'
GROUP BY O.owner_name
HAVING COUNT(v.exempt) > 1
Run Code Online (Sandbox Code Playgroud)

有没有一个解决方案可以让我像这样返回记录,不知道哪个owner_name可能是相似的?基本上是试图让服务器搜索owner_name列,如果有相似之处 JACOB JAMISON & JESSICAJACOB JAMISON M & JESSICA B那么它会像这样返回这些记录:

Owner_Name                      xNameAppears      ExemptCount
-------------------------------------------------------------      
JACOB JAMISON & JESSICA           2                         2
JACOB JAMISON M & JESSICA B       2                         2
BLACKSON BARRINGTON               2                         2
BLACKSON BARRINGTON H             2                         2
Run Code Online (Sandbox Code Playgroud)

先感谢您!

Eri*_*ing 7

SOUNDEX功能可以应用到一列,以及。

但由于

有成千上万这样的

我不建议只写一个查询来加入一个函数来做到这一点。

这在较大的表上可能不会表现得很好:

SELECT *
FROM dbo.vehicle AS v
JOIN dbo.vehicle AS v2
ON SOUNDEX(v2.Owner_Name) = SOUNDEX(v.Owner_Name)
AND v2.Owner_Name <> v.Owner_Name;
Run Code Online (Sandbox Code Playgroud)

从长远来看,我宁愿做一些可以让您更轻松地找到它的事情。

下面是一个例子:

CREATE TABLE dbo.vehicle (Owner_Name VARCHAR(50));
INSERT dbo.vehicle ( Owner_Name )
SELECT *
FROM (  
VALUES            
('JACOB JAMISON & JESSICA'),
('JACOB JAMISON M & JESSICA B'),
('BLACKSON BARRINGTON'),          
('BLACKSON BARRINGTON H'),        
('BRUSTER MICHAEL')
) AS x (Owner_Name);
Run Code Online (Sandbox Code Playgroud)

我将根据函数添加一个计算列,然后添加一个索引来帮助我的查询。

ALTER TABLE dbo.vehicle ADD Owner_Soundex AS SOUNDEX(Owner_Name);

CREATE INDEX ix_whatever ON dbo.vehicle (Owner_Soundex, Owner_Name);
Run Code Online (Sandbox Code Playgroud)

验证一切是否正常...

SELECT *
FROM dbo.vehicle AS v
Run Code Online (Sandbox Code Playgroud)

使用这样的查询来查找不精确的匹配:

SELECT *
FROM dbo.vehicle AS v
JOIN dbo.vehicle AS v2
ON v2.Owner_Soundex = v.Owner_Soundex
AND v2.Owner_Name <> v.Owner_Name;
Run Code Online (Sandbox Code Playgroud)