如何让查询更快?

jku*_*lak 2 sql postgresql full-text-search query-optimization trigram

我的tracks表包含大约 300 万条记录(每天增长 500 条),大约有 30 列,但我只在WHERE子句中使用 15 列。查询平均需要 4800 毫秒,没有其他用户/进程使用数据库。如何让它更快?我希望看到接近 100 毫秒的结果。

寻找歌曲(曲目)的人填写表格:

  • string -> 代表“歌曲标题或艺术家姓名”
  • 字符串 -> 代表“流派”
  • 日期 -> 表示“发布时间”
  • 剩余 12 个参数的最小/最大的几个整数

99% 的用例是 SELECT 查询:

SELECT
    "public"."tracks"."sys_id",
    "public"."tracks"."all_artists",
    "public"."tracks"."name",
    "public"."tracks"."genres",
    "public"."tracks"."release_date",
    "public"."tracks"."tempo",
    "public"."tracks"."popularity",
    "public"."tracks"."danceability",
    "public"."tracks"."energy",
    "public"."tracks"."speechiness",
    "public"."tracks"."acousticness",
    "public"."tracks"."instrumentalness",
    "public"."tracks"."liveness",
    "public"."tracks"."valence",
    "public"."tracks"."main_artist_popularity",
    "public"."tracks"."main_artist_followers",
    "public"."tracks"."key",
    "public"."tracks"."preview_url"
FROM
    "public"."tracks"
WHERE
    (
    "public"."tracks"."name" LIKE '%oultec%'
    OR "public"."tracks"."all_artists_string" LIKE '%oultec%'
    )
    AND ("public"."tracks"."genres_string" LIKE '%rum%')
    AND "public"."tracks"."tempo" >= '80'
    AND "public"."tracks"."tempo" <= '210'
    AND "public"."tracks"."popularity" >= '0'
    AND "public"."tracks"."popularity" <= '100'
    AND "public"."tracks"."main_artist_popularity" >= '1'
    AND "public"."tracks"."main_artist_popularity" <= '100'
    AND "public"."tracks"."main_artist_followers" >= '1'
    AND "public"."tracks"."main_artist_followers" <= '50000000'
    AND "public"."tracks"."danceability" >= '0'
    AND "public"."tracks"."danceability" <= '1000'
    AND "public"."tracks"."energy" >= '0'
    AND "public"."tracks"."energy" <= '1000'
    AND "public"."tracks"."speechiness" >= '0'
    AND "public"."tracks"."speechiness" <= '1000'
    AND "public"."tracks"."acousticness" >= '0'
    AND "public"."tracks"."acousticness" <= '1000'
    AND "public"."tracks"."instrumentalness" >= '0'
    AND "public"."tracks"."instrumentalness" <= '1000'
    AND "public"."tracks"."liveness" >= '0'
    AND "public"."tracks"."liveness" <= '1000'
    AND "public"."tracks"."valence" >= '0'
    AND "public"."tracks"."valence" <= '1000'
    AND "public"."tracks"."release_date" >= '2020-01-01'
    AND "public"."tracks"."key" = '10'
ORDER BY
    "public"."tracks"."release_date" DESC,
    "public"."tracks"."popularity" DESC,
    "public"."tracks"."sys_id" ASC
LIMIT 5 OFFSET 0;
Run Code Online (Sandbox Code Playgroud)

索引(指数):

PRIMARY sys_id
UNIQUE  main_artist, name, duration_ms
INDEX   energy
INDEX   tempo, popularity, main_artist_popularity, main_artist_followers, danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, name, all_artists_string, genres_string, release_date, key
Run Code Online (Sandbox Code Playgroud)

EXPLAIN/ ANALYZE

PRIMARY sys_id
UNIQUE  main_artist, name, duration_ms
INDEX   energy
INDEX   tempo, popularity, main_artist_popularity, main_artist_followers, danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, name, all_artists_string, genres_string, release_date, key
Run Code Online (Sandbox Code Playgroud)

PostgreSQL 从“官方”镜像运行postgres:14.1-alpine::

  • Ubuntu-20.04-x86_64
  • 2个CPU
  • 2GB内存
  • 20GB SSD 驱动器

系统顶部

表结构:

轨道表结构

运行查询的网站(通过 API/后端,更多带有最小/最大整数的字段,但此处未显示):

用于运行查询的表单视图

O. *_*nes 5

您的查询LIKE '%something%'对日期和数字进行全文搜索和范围扫描。但 BTREE 索引(默认情况下)只能处理一次范围扫描。LIKE '%something%'而且,他们根本无法应对。因此,您将对每个查询进行全表扫描。考虑到三个巨型行的 4.8 秒还算不错。

对于您的column LIKE '%something%'搜索,您可以尝试使用 trigram 索引,这是 postgreSQL 的一项功能。此代码将在 上创建三元组索引name。这可能会缩小选择范围,从而使您必须扫描更少的数据。

CREATE EXTENSION pg_trgm;  -- you may or may not need this statement.
CREATE INDEX CONCURRENTLY tracks_name
     ON tracks
  USING GIN (name gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_all_artists_string
     ON tracks
  USING GIN (all_artists_string gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_genres_string
     ON tracks
  USING GIN (genres_string gin_trgm_ops);
Run Code Online (Sandbox Code Playgroud)

但您仍然需要扫描所有匹配的曲目。

如果您创建这些索引,然后重构 WHERE 子句的前几位以使用像这样的集合计算,您可能(或可能不会)获得更好的性能。

WHERE sys_id IN (
     (SELECT sys_id FROM tracks WHERE name LIKE '%oultec%'
       UNION 
      SELECT sys_id FROM tracks WHERE all_artists_string LIKE '%oultec%'
     )
     INTERSECT
     SELECT sys_id FROM tracks WHERE genres_string LIKE '%oultec%'
    )
  AND tempo >= '80' ... 
Run Code Online (Sandbox Code Playgroud)

但事实是 SQL 并不适合所有这些范围扫描。

  • 感谢您的反馈!4.8s 到 50ms,相当不错! (2认同)