自动完成太慢:可能的优化吗?

A_V*_*A_V 3 performance sql-server sql-server-2008-r2 string-searching query-performance

我网站的自动完成搜索功能搜索包含销售商品型号的 varchar 字段。

该字段可以包含 1 到 75 个字符的字符串,并且该表包含 400 000 行。我提出了一个仅从字符串开头搜索的查询,执行时间大约为 150-250 毫秒,这是可以接受的,但现在我的经理希望查询搜索任何子字符串,这会使查询速度慢 3-10 倍(大约 1000-2000 毫秒)。

我构建了一个 JS 小提琴,为您提供数据的示例以及两个查询的示例。

http://sqlfiddle.com/#!6/9efa3/2/0

表上已经有一些索引了。加速这个自动完成搜索字段的最佳实践是什么?(数据库版本为SQLSERVER 2008R2)

这是我正在处理的数据的一个简短示例:

CREATE TABLE [Products](
    [productid] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
    [model] [nvarchar](75) NOT NULL 
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED 
(
    [productid] ASC
));

insert into products values ('UMPX1AA0011 danish e-315 woot');
insert into products values ('P27y719VC');
insert into products values ('VG2y439m-LED');
insert into products values ('UMUyX165AAB01');
insert into products values ('U28y79VF');
insert into products values ('U52417HJ');
insert into products values ('VA25746M-LED WITH FLYING CORNERS');
insert into products values ('S19F350HNN 1pc california storage');
insert into products values ('VA211917A');
insert into products values ('PM2500X2');
insert into products values ('E22470SWHE');
insert into products values ('V22465WLYDP');
insert into products values ('I129LMH1HKC');
insert into products values ('OM5EN X 35 the new version');
insert into products values ('DLS3060WDB');
insert into products values ('PVW');
insert into products values ('LI23721S');
insert into products values ('V173516LBM');
insert into products values ('VX2376-SMHD-A');
insert into products values ('GUM5FX1AA1001');
insert into products values ('GPM300X11');
insert into products values ('GUM-WH6AA002');
insert into products values ('2435V5LSB');
insert into products values ('P2418HZ');
insert into products values ('Stylish sectional one of a kind y-5151');
Run Code Online (Sandbox Code Playgroud)

这是我正在比较的两个查询

--runs acceptably fast, about 100-250ms
select * from products where model like 'y-5151'+'%';
--takes too long, around 1000-2500ms
select * from products where model like '%' + 'y-5151' +'%'
Run Code Online (Sandbox Code Playgroud)

A_V*_*A_V 5

我采用的解决方案是构建一个“半三角图”表,其中包含所有模型 # 子字符串的预处理版本,如Aaron Bertrand在他的以下两篇博客文章中所建议的:

https://sqlperformance.com/2017/02/sql-indexes/seek-leading-wildcard-sql-server

https://sqlperformance.com/2017/02/sql-performance/follow-up-1-leading-wildcard-seeks

我的解决方案是创建一个新表,在其中查找以搜索字符串开头的模型。每个型号均按此列出,假设 Product_ID 为51,型号为7500 Twin Bed

ID|Model
----------------
51|7500 Twin Bed
51|500 Twin Bed
51|00 Twin Bed
51|0 Twin Bed
51| Twin Bed
51|Twin Bed
51|win Bed
51|in Bed
51|n Bed
51| Bed
51|Bed
51|ed
51|d
Run Code Online (Sandbox Code Playgroud)

这样您就不需要进行完整的通配符搜索,简单的搜索select distinct id_product from products_dictionary where model like 'Twin%'就会返回所需的结果。现在查询花费的时间不到 100 毫秒。

这是我用来创建表并填充它的代码。同样,这在 Aaron 的博客文章中得到了正确的描述:

CREATE TABLE [dbo].[products_dictionary](
    [Id_product] [int],
    [model] [nvarchar](75) NOT NULL
)
insert into [products_dictionary] 
select p.id_product,f.fragment from products p 
cross apply dbo.CreateStringFragments(p.model) AS f;

create clustered index index_idprod_substrmodel on [Products_dictionary]([model],[Id_product])
Run Code Online (Sandbox Code Playgroud)