所以我有这个包含 620 万条记录的表,我必须对列执行具有相似性的搜索查询。查询可以是:
SELECT "lca_test".* FROM "lca_test"
WHERE (similarity(job_title, 'sales executive') > 0.6)
AND worksite_city = 'los angeles'
ORDER BY salary ASC LIMIT 50 OFFSET 0
Run Code Online (Sandbox Code Playgroud)
可以在 where(year = X, worksite_state = N, status = 'certified',visa_class = Z) 中添加更多条件。
运行其中一些查询可能需要很长时间,超过 30 秒。有时超过一分钟。
EXPLAIN ANALYZE
前面提到的查询给了我这个:
Run Code Online (Sandbox Code Playgroud)Limit (cost=0.43..42523.04 rows=50 width=254) (actual time=9070.268..33487.734 rows=2 loops=1) -> Index Scan using index_lca_test_on_salary on lca_test (cost=0.43..23922368.16 rows=28129 width=254) (actual time=9070.265..33487.727 rows=2 loops=1) >>>> Filter: (((worksite_city)::text = 'los angeles'::text) AND (similarity((job_title)::text, 'sales executive'::text) > 0.6::double precision)) …
postgresql index full-text-search pattern-matching postgresql-9.3