我有一个包含大约 700 万条记录的表。该表有一个名字和姓氏列,我想使用 levenshtein() 距离函数进行搜索。
select levenshtein('JOHN', first_name) as fn_distance,
levenshtein('DOE', last_name) as ln_distance,
id,
first_name as "firstName",
last_name as "lastName"
from person
where first_name is not null
and last_name is not null
and levenshtein('JOHN', first_name) <= 2
and levenshtein('DOE', last_name) <= 2
order by 1, 2
limit 50;
Run Code Online (Sandbox Code Playgroud)
上面的搜索很慢(4 - 5 秒),我可以做些什么来提高性能?应该在两列上创建索引,还是其他什么?
添加以下索引后:
create index first_name_idx on person using gin (first_name gin_trgm_ops);
create index last_name_idx on person using gin(last_name gin_trgm_ops);
Run Code Online (Sandbox Code Playgroud)
查询现在大约需要 11 秒。:(
新查询:
select similarity('JOHN', first_name) as fnsimilarity, …Run Code Online (Sandbox Code Playgroud)