Adj*_*con 5 postgresql index full-text-search
根据我迄今为止收集到的信息,如果您需要对 PostgreSQL(psql 9.6.2,服务器 9.6.5)数据库中包含大量条目(例如 1.2M+ 的订单)的表运行全文搜索),推荐的方法是为该表创建一个索引(在本例中我们创建了一个 GIN 索引),它应该允许您运行如下查询:
SELECT * FROM speech WHERE speech_tsv @@ plainto_tsquery('a text string')
Run Code Online (Sandbox Code Playgroud)
除了此查询的结果有时不包含任何相关搜索字符串之外,它通常需要 8 到 10 秒。
该数据库部署在一个相当大的多核 EC2 实例上,所以我在想,我们是否可以对数据库做其他事情来帮助这些查询运行得更快?
或者考虑到我们要求它搜索的大量文件和文本(即使通过索引),这个查询执行时间大约是合理的?
该表如下所示:
Table "public.speech"
Column | Type | Modifiers
---------------+-----------------------------+-----------------------------------------------------
speech_id | integer | not null default nextval('speech_id_seq'::regclass)
speechtype_id | smallint | not null
title | character varying | not null default ''::character varying
speechdate | date | default now()
location | character varying | not null default ''::character varying
source | character varying | not null default ''::character varying
speechtext | text | not null
url | character varying | not null default ''::character varying
release_id | smallint |
created | timestamp without time zone |
modified | timestamp without time zone |
speech_tsv | tsvector |
key | boolean |
summary | text |
quote | text |
Indexes:
"speech_pk" PRIMARY KEY, btree (speech_id)
"speech__release_id" btree (release_id)
"speech__speech_tsv" gin (speech_tsv)
"speech__speechdate" btree (speechdate)
"speech__speechtype_id" btree (speechtype_id)
Foreign-key constraints:
"speech__release_id_fk" FOREIGN KEY (release_id) REFERENCES release(release_id) MATCH FULL ON DELETE RESTRICT DEFERRABLE INITIALLY DEFERRED
"speech__speechtype_id_fk" FOREIGN KEY (speechtype_id) REFERENCES speechtype(speechtype_id) MATCH FULL DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "factcheck_speech" CONSTRAINT "factcheck_speech_speech_id_fkey" FOREIGN KEY (speech_id) REFERENCES speech(speech_id) MATCH FULL ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
TABLE "speech_candidate" CONSTRAINT "speech_candidate__speech_id_fk" FOREIGN KEY (speech_id) REFERENCES speech(speech_id) MATCH FULL ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
TABLE "speech_category" CONSTRAINT "speech_category__speech_id_fk" FOREIGN KEY (speech_id) REFERENCES speech(speech_id) MATCH FULL ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
TABLE "speech_tag" CONSTRAINT "speech_tag__speech_fk" FOREIGN KEY (speech_id) REFERENCES speech(speech_id) MATCH FULL ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
TABLE "speechlocking" CONSTRAINT "speechlocking__fkey" FOREIGN KEY (speech_id) REFERENCES speech(speech_id) MATCH FULL ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
Triggers:
speech_updated BEFORE INSERT OR UPDATE ON speech FOR EACH ROW EXECUTE PROCEDURE pvs_speech_updated()
update_speech_created BEFORE INSERT ON speech FOR EACH ROW EXECUTE PROCEDURE update_created_column()
update_speech_modified BEFORE UPDATE ON speech FOR EACH ROW EXECUTE PROCEDURE update_modified_column()
Run Code Online (Sandbox Code Playgroud)
(speechtext
显然,该列包含要搜索的所有文本)
下面是一个EXPLAIN (ANALYZE,BUFFERS)
直接在服务器上执行的示例查询(尽管这些查询实际上是在 Python 应用程序中执行的,因此它在这里运行得更快一些,没有网络延迟等):
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on speech (cost=294.85..7931.12 rows=6142 width=1058) (actual time=400.623..67768.222 rows=27267 loops=1)
Recheck Cond: (speech_tsv @@ plainto_tsquery('gun'::text))
Heap Blocks: exact=23582
Buffers: shared hit=2413 read=21424
-> Bitmap Index Scan on speech__speech_tsv (cost=0.00..293.31 rows=6142 width=0) (actual time=279.709..279.709 rows=30535 loops=1)
Index Cond: (speech_tsv @@ plainto_tsquery('gun'::text))
Buffers: shared hit=241 read=14
Planning time: 0.187 ms
Execution time: 67778.684 ms
(9 rows)
Run Code Online (Sandbox Code Playgroud)
如果你看一下解释输出,实际的索引扫描并不算慢,大约为 280 毫秒。缓慢的部分是获取您在第二步中请求的所有数据。
您在这里执行操作SELECT *
,要求获得该表中的所有列。从解释输出来看,这是一个相当宽的表,有很多或很大的列。您的查询正在获取大约 27000 个大行。
Buffers 行的“read”和“hit”部分告诉您必须从硬盘驱动器或 SSD 读取 21424 个块,它们没有缓存在 RAM 中。当您从磁盘读取大量数据时,这将需要一些时间。
另一个因素是您要将所需的所有数据传输给客户端,这也需要时间。
您向数据库请求大量数据,但我怀疑您不需要所有这些数据。因此,您应该在查询中更加具体,只查询您实际需要的列,并添加一个LIMIT
子句,除非您确实想要获取所有 27267 行。
归档时间: |
|
查看次数: |
2217 次 |
最近记录: |