告诉我SQL Server全文搜索器很疯狂,不是我

Ian*_*oyd 5 sql-server full-text-search sql-server-2000 full-text-indexing

我有一些客户具有用户正在搜索的特定地址:

123通用方式

数据库中有5行匹配:

ResidentialAddress1
=============================
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
Run Code Online (Sandbox Code Playgroud)

我运行FT查询来查找这些行.我将向您展示每一步,因为我为搜索添加了更多条件:

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"123*"')

ResidentialAddress1
=========================
123 MAPLE STREET
12345 TEST
123 MINE STREET
123 GENERIC WAY
123 FAKE STREET
...

(30 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

好的,到目前为止一直很好,现在添加" 通用 " 一词:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')

ResidentialAddress1
=============================
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY

(5 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

优秀.现在我将添加用户想要确保存在的最终关键字:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"way*"')


ResidentialAddress1            
------------------------------ 

(0 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

咦?没有行?如果我只查询"方式*"怎么办:

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"way*"')

ResidentialAddress1            
------------------------------ 

(0 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

起初我认为也许是因为它*,并且它要求根way在它之后有更多的字符.但事实并非如此:

  • 正在搜索"123*"匹配"123"
  • 搜索"generic*"匹配"generic"
  • 在线书籍说,星号匹配零个,一个或多个字符

如果我删除了*刚才的s&g 怎么办?

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"way"')

Server: Msg 7619, Level 16, State 1, Line 1
A clause of the query contained only ignored words. 
Run Code Online (Sandbox Code Playgroud)

因此,人们可能会认为你只是甚至不允许搜索way,单独或作为一个根.但这也不是真的:

SELECT * FROM Patrons
WHERE CONTAINS(Patrons.*, '"way*"')

AccountNumber FirstName Lastname
------------- --------- --------
33589         JOHN      WAYNE                    
Run Code Online (Sandbox Code Playgroud)

总而言之,用户正在搜索包含所有单词的行:

123通用方式

我正确地将其转化为WHERE条款:

SELECT * FROM Patrons
WHERE CONTAINS(Patrons.*, '"123*"')
AND CONTAINS(Patrons.*, '"generic*"')
AND CONTAINS(Patrons.*, '"way*"')
Run Code Online (Sandbox Code Playgroud)

它不返回任何行.告诉我这不会起作用,这不是我的错,而且SQL Server很疯狂.

注意:我已经清空FT指数并重建它.

更新一

SELECT Lastname, ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.*, '"gen*"')

Lastname                  ResidentialAddress1            
------------------------- ------------------------------ 
SAVE                      123 GENERIC WAY
Genders                   
SAVE                      123 GENERIC WAY
Patron                    123 GENERIC WAY
SAVE                      123 GENERIC WAY
SAVE                      234 GENERIC WAY
SAVE                      123 GENERIC WAY

(7 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

更新二

假装用户输入:

123通用wa

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"wa*"')

ResidentialAddress1            
------------------------------ 

(0 row(s) affected)
Run Code Online (Sandbox Code Playgroud)

真正的问题是用户输入完全有效的东西,他们希望看到任何人都期望看到的东西.


更新三

有人要求这一切,这不是我的错!

CREATE TABLE [dbo].[Patrons] (
    [PatronGUID]  uniqueidentifier ROWGUIDCOL  NOT NULL ,
    [AccountNumber] [bigint] NULL ,
    [FirstName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [MiddleInitial] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Lastname] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [EyeColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [HairColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Gender] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Birthday] [datetime] NULL ,
    [Height] [int] NULL ,
    [Weight] [int] NULL ,
    [FacialHair] [tinyint] NULL ,
    [Nationality] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [IdentifyingMarks] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseExpires] [datetime] NULL ,
    [DriversLicenseDateVerified] [datetime] NULL ,
    [PassportNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportExpires] [datetime] NULL ,
    [PassportDateVerified] [datetime] NULL ,
    [OtherIdentificationNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationExpires] [datetime] NULL ,
    [OtherIdentificationDateVerified] [datetime] NULL ,
    [ResidentialAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialPhoneNumber] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [CountryOfResidence] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessPhone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PositionWithFirm] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [EmployerTelephone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [MemberCardType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PlayerStatusCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountStatus1] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountStatus2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [IsVIPExchangeRate] [tinyint] NULL ,
    [ChangedUserGUID_Depricated] [uniqueidentifier] NULL ,
    [ChangedUser] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ChangedDate] [datetime] NULL ,
    [ChangedWorkstation] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PendingUpdates_Depricated] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL 
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Patrons] ADD 
    CONSTRAINT [DF_Patrons_PatronGUID] DEFAULT (newid()) FOR [PatronGUID],
    CONSTRAINT [PK_Patrons] PRIMARY KEY  NONCLUSTERED 
    (
        [PatronGUID]
    ) WITH  FILLFACTOR = 90  ON [PRIMARY] 
GO

if (select DATABASEPROPERTY(DB_NAME(), N'IsFullTextEnabled')) <> 1 
exec sp_fulltext_database N'enable' 

GO

if not exists (select * from dbo.sysfulltextcatalogs where name = N'TheFullTextCatalog')
exec sp_fulltext_catalog N'TheFullTextCatalog', N'create' 

GO

exec sp_fulltext_table N'[dbo].[Patrons]', N'create', N'TheFullTextCatalog', N'PK_Patrons'
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'FirstName', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'MiddleInitial', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'Lastname', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'EyeColor', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'IdentifyingMarks', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress1', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress2', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress3', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCity', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialZipCode', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialRegion', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCountry', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialPhoneNumber', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'CountryOfResidence', N'add', 1033  
GO

exec sp_fulltext_table N'[dbo].[Patrons]', N'activate'  
GO
Run Code Online (Sandbox Code Playgroud)

这是那个不相信我的人的截图:

该查询应该有效但不能: alt文本http://i49.tinypic.com/dbo8w9.png

该查询有效但无效: 替代文字http://i49.tinypic.com/30mptm9.png

对证明内容有效但无用的查询: alt text http://i49.tinypic.com/2q04nmc.png


更新四

查询不能写为

CONTAINS(Patrons.*, 'words...')
Run Code Online (Sandbox Code Playgroud)

由于FT指数中没有逻辑或物理覆盖的项目.例如,用户查询:

6/4/2010 ian boyd 619

提供四个关键字:

  • 2010年6月4日
  • 伊恩
  • 博伊德
  • 619

这意味着他们希望所有条件都成立,伪代码为:

WHERE 6/4/2010 is in the row
AND ian is in the row
AND boyd is in the row
AND 619 is in the row
Run Code Online (Sandbox Code Playgroud)

哪个被翻译成部分查询:

WHERE --Keyword 1: 6/4/2010
(
   ((ChangedDate >= '20100604') AND (ChangedDate < '20100605'))
   OR 
   ((LastTransactionDate >= '20100604') AND (LastTransactionDate < '20100605'))
   OR 
   (CONTAINS(Patrons.*, '"6/4/2010*"')
)
AND --Keyword 2: ian
(
    CONTAINS(Patrons.*, '"ian*"')
)
AND --Keyword 3: boyd
(
    CONTAINS(Patrons.*, '"boyd*"')
)
AND --Keyword 4: 619
(
    (AccountNumber IN (SELECT CAST(619 AS bigint)))
    OR
    (CONTAINS(Patrons.*, '"619*"'))
)
Run Code Online (Sandbox Code Playgroud)

其中一位回答者正在查看原始问题中提出的简化示例; 不是现实世界.说有多个条款是不正确的AND.

Ste*_*dit 6

该消息告诉您"方式"是一个停用词,这意味着它被忽略而没有索引.这就是为什么你可以找到"wayne"而不是"way"的原因.

所以,不,这不是疯了,你也不是.这只是一个简单的误解.