PostgreSql 在运行时更改相同的查询计划

jet*_*157 5 postgresql execution-plan postgresql-performance

我这里有一个非常奇怪的问题。我们的搜索查询接收多个参数并通过不同表中的多个联接进行搜索。查询在我们运行的前几次运行良好,然后有时会严重减慢,例如:从 200 毫秒到 15000/20000 毫秒。在同一个查询上。

最初我认为这是后端 ORM 问题,但在彻底检查并打开慢速查询日志、打开 auto_explain 功能后,我意识到查询计划在几次调用后发生了变化。我们在后端放置一个循环,在同一线程上多次执行查询(用于测试),我发现几乎总是在执行 8 次后数据库开始记录慢速查询。最后结果是查询计划发生了变化。最重要的变化是(我认为)TimeSlots 表上的变化,从 seq 扫描到位图索引扫描,它估计的行数比表中的行数少得多。估计:566 实际:100 000 左右。

我之前尝试过的事情:

  • 将硬件从 2 核 6GB 升级到 8 核 26GB 内存(100GB SSD)
  • 禁用任何其他应用程序(从而停止任何其他负载)访问它。
  • 在表上执行手动(也有自动运行的作业)VACUUM、ANALYZE、REINDEX。
  • 设置各个列的统计信息,例如SlotDateTime(至10000)
  • 创建不明确的统计数据。
  • 增加了 work_mem、maintenance_work_mem、shared_buffers 大小。
  • 将 seq_page_cost 减少到 1。

这是选择语句:

SELECT DISTINCT u."UserId"     AS                                     UserId,
                             cast(cuj.company_id as bigint) AS                                     CompanyId,
                         cast(ts.SlotsCount as bigint) as SlotsCount,
                         case when cuj.vip is null then false else cuj.vip end vip,
                         u.is_vip       as                                     userVip,

                         case
                             when ts2.EarliestDate isnull then '7777-12-01 21:00:00.000000'
                             else ts2.EarliestDate end                         EarliestDate
         from "Users" u
             left JOIN "UserSpecialty" usp ON usp."UserId" = u."UserId"
             left join "company_users_join" cuj ON cuj.user_id = u."UserId"  and cuj.company_user_role =2 and cuj.status in(1,3)
             LEFT JOIN "company_users_join_providers" cujp
              ON cuj.id = cujp.company_users_join_id
             left join ( SELECT T."DoctorId", dse.company_user_join_id, COALESCE(COUNT(*),0)  as SlotsCount
                          FROM "TimeSlots" T
                                  inner JOIN
                               "DoctorSchedule" dse
                               ON T."DoctorScheduleId" = dse."DoctorScheduleId"
                          WHERE (T."SlotDateTime" > :dateStart
                              AND T."SlotDateTime" < :dateEnd)
                            and (:specialtyId = 0 or t."SpecialityId" = :specialtyId) 
                            and (:specialtyId = 0 or dse."SpecialityId" = :specialtyId)
                            AND (dse."EndDate" < T."SlotDateTime"
                              OR dse."EndDate" IS NULL)
                            AND T."DeleteDate" IS NULL
                            AND T."IsActive" = TRUE
                            and dse."DeleteDate" is null
                          GROUP BY T."DoctorId",dse.company_user_join_id) as ts
                 on ts."DoctorId" = u."UserId" and  ((ts.company_user_join_id is null and cuj.company_id is null) or ts.company_user_join_id = cuj.id) 
         left join (SELECT t1."DoctorId", dse.company_user_join_id, min(t1."SlotDateTime") as EarliestDate
               FROM "TimeSlots" t1
                       inner JOIN
                    "DoctorSchedule" dse
                    ON t1."DoctorScheduleId" = dse."DoctorScheduleId"
               WHERE (
                       t1."SlotDateTime" > :dateStart
                       AND t1."SlotDateTime" < :dateEnd
                   )
                 and (:specialtyId =0 or t1."SpecialityId" = :specialtyId) 
                 AND (
                       dse."EndDate" < t1."SlotDateTime"
                       OR dse."EndDate" IS NULL
                   )
                 and(:serviceTypeId = 0 or t1."ServiceTypeId" = :serviceTypeId ) 
                 AND t1."DeleteDate" IS NULL
                 AND dse."DeleteDate" is null
                 AND t1."IsActive" = TRUE
               GROUP BY t1."DoctorId", dse.company_user_join_id) ts2
              on ts2."DoctorId" = u."UserId" and  ((ts2.company_user_join_id is null and cuj.company_id is null) or ts2.company_user_join_id = cuj.id) 
            and(:companyId =0 or cuj.company_id = :companyId) 
           and(:providerId =0 or cujp.provider_id = :providerId)
           and(:specialtyId =0 or usp."SpecialtyId" = :specialtyId) 
           and (:onlyForeignDoctors = false or u.is_foreign_doctor = true)
           and (:onlyForeignDoctors = true or LOWER(u."FullName") = :fullName or (u.is_foreign_doctor is null or u.is_foreign_doctor = false))
           and  LOWER(u."FullName") LIKE concat('%', :fullName, '%')
           and ((length(:fullName) =0 and ts.SlotsCount >0) or (length(:fullName) >0 ))
        and ts.SlotsCount >=0
       ORDER BY  userVip desc, vip desc, EarliestDate asc, SlotsCount desc   offset 0 limit 20
Run Code Online (Sandbox Code Playgroud)

这是一开始的查询计划(快速):

Limit  (cost=9689.40..9689.42 rows=1 width=30) (actual time=127.650..127.682 rows=20 loops=1)
      Buffers: shared hit=2229 read=2013
      ->  Unique  (cost=9689.40..9689.42 rows=1 width=30) (actual time=127.639..127.668 rows=20 loops=1)
            Buffers: shared hit=2229 read=2013
            ->  Sort  (cost=9689.40..9689.40 rows=1 width=30) (actual time=127.637..127.653 rows=51 loops=1)
                  Sort Key: u.is_vip DESC, (CASE WHEN (cuj.vip IS NULL) THEN false ELSE cuj.vip END) DESC, (CASE WHEN ((min(t1."SlotDateTime")) IS NULL) THEN '7777-12-01 21:00:00'::timestamp without time zone ELSE (min(t1."SlotDateTime")) END), (COALESCE(count(*), '0'::bigint)) DESC, u."UserId", cuj.company_id
                  Sort Method: quicksort  Memory: 57kB
                  Buffers: shared hit=2229 read=2013
                  ->  Nested Loop Left Join  (cost=9297.95..9689.39 rows=1 width=30) (actual time=122.171..127.367 rows=419 loops=1)
                        Join Filter: (cuj.id = cujp.company_users_join_id)
                        Rows Removed by Join Filter: 26015
                        Buffers: shared hit=2215 read=2013
                        ->  Hash Right Join  (cost=9297.95..9686.68 rows=1 width=38) (actual time=122.107..122.421 rows=391 loops=1)
                              Hash Cond: (t1."DoctorId" = u."UserId")
                              Join Filter: (((dse.company_user_join_id IS NULL) AND (cuj.company_id IS NULL)) OR (dse.company_user_join_id = cuj.id))
                              Rows Removed by Join Filter: 26
                              Buffers: shared hit=1825 read=2012
                              ->  HashAggregate  (cost=4100.24..4263.91 rows=16367 width=20) (actual time=37.779..37.953 rows=133 loops=1)
                                    Group Key: t1."DoctorId", dse.company_user_join_id
                                    Batches: 1  Memory Usage: 801kB
                                    Buffers: shared hit=1268 read=53
                                    ->  Hash Join  (cost=853.30..3977.49 rows=16367 width=20) (actual time=3.710..28.964 rows=45298 loops=1)
                                          Hash Cond: (t1."DoctorScheduleId" = dse."DoctorScheduleId")
                                          Join Filter: ((dse."EndDate" < t1."SlotDateTime") OR (dse."EndDate" IS NULL))
                                          Buffers: shared hit=1268 read=53
                                          ->  Bitmap Heap Scan on "TimeSlots" t1  (cost=624.79..3638.82 rows=41926 width=16) (actual time=2.127..16.859 rows=45300 loops=1)
                                                Recheck Cond: ("ServiceTypeId" = 1)
                                                Filter: (("DeleteDate" IS NULL) AND "IsActive" AND ("SlotDateTime" > '2021-03-29 23:22:36.152'::timestamp without time zone) AND ("SlotDateTime" < '2021-04-28 23:22:36.152'::timestamp without time zone))
                                                Rows Removed by Filter: 8302
                                                Heap Blocks: exact=1157
                                                Buffers: shared hit=1157 read=53
                                                ->  Bitmap Index Scan on timeslots4  (cost=0.00..614.31 rows=53602 width=0) (actual time=1.960..1.961 rows=53602 loops=1)
                                                      Index Cond: ("ServiceTypeId" = 1)
                                                      Buffers: shared read=53
                                          ->  Hash  (cost=197.21..197.21 rows=2504 width=20) (actual time=1.537..1.538 rows=2504 loops=1)
                                                Buckets: 4096  Batches: 1  Memory Usage: 132kB
                                                Buffers: shared hit=111
                                                ->  Seq Scan on "DoctorSchedule" dse  (cost=0.00..197.21 rows=2504 width=20) (actual time=0.009..1.110 rows=2504 loops=1)
                                                      Filter: ("DeleteDate" IS NULL)
                                                      Rows Removed by Filter: 3897
                                                      Buffers: shared hit=111
                              ->  Hash  (cost=5197.70..5197.70 rows=1 width=30) (actual time=84.315..84.322 rows=391 loops=1)
                                    Buckets: 1024  Batches: 1  Memory Usage: 35kB
                                    Buffers: shared hit=557 read=1959
                                    ->  Nested Loop Left Join  (cost=4728.99..5197.70 rows=1 width=30) (actual time=83.475..84.219 rows=391 loops=1)
                                          Buffers: shared hit=557 read=1959
                                          ->  Nested Loop Left Join  (cost=4728.72..5197.30 rows=1 width=30) (actual time=83.461..83.934 rows=138 loops=1)
                                                Filter: (((dse_1.company_user_join_id IS NULL) AND (cuj.company_id IS NULL)) OR (dse_1.company_user_join_id = cuj.id))
                                                Rows Removed by Filter: 8
                                                Buffers: shared hit=280 read=1959
                                                ->  Hash Join  (cost=4728.57..5186.85 rows=39 width=21) (actual time=83.411..83.619 rows=139 loops=1)
                                                      Hash Cond: (t."DoctorId" = u."UserId")
                                                      Buffers: shared hit=2 read=1956
                                                      ->  HashAggregate  (cost=4668.48..5087.57 rows=3104 width=20) (actual time=81.842..82.001 rows=147 loops=1)
                                                            Group Key: t."DoctorId", dse_1.company_user_join_id
                                                            Filter: ((COALESCE(count(*), '0'::bigint) > 0) AND (COALESCE(count(*), '0'::bigint) >= 0))
                                                            Batches: 1  Memory Usage: 801kB
                                                            Buffers: shared hit=2 read=1839
                                                            ->  Hash Join  (cost=228.51..4236.18 rows=34584 width=12) (actual time=2.341..63.449 rows=95524 loops=1)
                                                                  Hash Cond: (t."DoctorScheduleId" = dse_1."DoctorScheduleId")
                                                                  Join Filter: ((dse_1."EndDate" < t."SlotDateTime") OR (dse_1."EndDate" IS NULL))
                                                                  Buffers: shared hit=2 read=1839
                                                                  ->  Seq Scan on "TimeSlots" t  (cost=0.00..3774.93 rows=88591 width=16) (actual time=0.018..38.614 rows=95531 loops=1)
                                                                        Filter: (("DeleteDate" IS NULL) AND "IsActive" AND ("SlotDateTime" > '2021-03-29 23:22:36.152'::timestamp without time zone) AND ("SlotDateTime" < '2021-04-28 23:22:36.152'::timestamp without time zone))
                                                                        Rows Removed by Filter: 17731
                                                                        Buffers: shared hit=2 read=1728
                                                                  ->  Hash  (cost=197.21..197.21 rows=2504 width=20) (actual time=2.293..2.294 rows=2504 loops=1)
                                                                        Buckets: 4096  Batches: 1  Memory Usage: 132kB
                                                                        Buffers: shared read=111
                                                                        ->  Seq Scan on "DoctorSchedule" dse_1  (cost=0.00..197.21 rows=2504 width=20) (actual time=0.012..1.782 rows=2504 loops=1)
                                                                              Filter: ("DeleteDate" IS NULL)
                                                                              Rows Removed by Filter: 3897
                                                                              Buffers: shared read=111
                                                      ->  Hash  (cost=57.25..57.25 rows=227 width=5) (actual time=1.560..1.561 rows=218 loops=1)
                                                            Buckets: 1024  Batches: 1  Memory Usage: 16kB
                                                            Buffers: shared read=117
                                                            ->  Index Scan using users__index_is_published on "Users" u  (cost=0.29..57.25 rows=227 width=5) (actual time=0.180..1.507 rows=218 loops=1)
                                                                  Index Cond: ("IsPublished" = true)
                                                                  Filter: (((lower(("FullName")::text) = ''::text) OR (is_foreign_doctor IS NULL) OR (NOT is_foreign_doctor)) AND (lower(("FullName")::text) ~~ concat('%', ''::character varying, '%')))
                                                                  Rows Removed by Filter: 9
                                                                  Buffers: shared read=117
                                                ->  Index Scan using company_users_join__index_sch1 on company_users_join cuj  (cost=0.14..0.26 rows=1 width=21) (actual time=0.002..0.002 rows=1 loops=139)
                                                      Index Cond: ((user_id = u."UserId") AND (company_user_role = 2))
                                                      Filter: (status = ANY ('{1,3}'::integer[]))
                                                      Buffers: shared hit=278 read=3
                                          ->  Index Only Scan using us1 on "UserSpecialty" usp  (cost=0.28..0.38 rows=2 width=4) (actual time=0.001..0.001 rows=3 loops=138)
                                                Index Cond: ("UserId" = u."UserId")
                                                Heap Fetches: 0
                                                Buffers: shared hit=277
                        ->  Seq Scan on company_users_join_providers cujp  (cost=0.00..1.87 rows=67 width=8) (actual time=0.001..0.006 rows=67 loops=391)
                              Buffers: shared hit=390 read=1
Run Code Online (Sandbox Code Playgroud)

这是慢的一个

Limit  (cost=2664.61..2664.63 rows=1 width=30) (actual time=377.322..377.348 rows=20 loops=1)
      Buffers: shared hit=383194
      ->  Unique  (cost=2664.61..2664.63 rows=1 width=30) (actual time=377.320..377.343 rows=20 loops=1)
            Buffers: shared hit=383194
            ->  Sort  (cost=2664.61..2664.62 rows=1 width=30) (actual time=377.319..377.330 rows=51 loops=1)
                  Sort Key: u.is_vip DESC, (CASE WHEN (cuj.vip IS NULL) THEN false ELSE cuj.vip END) DESC, (CASE WHEN ((min(t1."SlotDateTime")) IS NULL) THEN '7777-12-01 21:00:00'::timestamp without time zone ELSE (min(t1."SlotDateTime")) END), (COALESCE(count(*), '0'::bigint)) DESC, u."UserId", cuj.company_id
                  Sort Method: quicksort  Memory: 57kB
                  Buffers: shared hit=383194
                  ->  Merge Left Join  (cost=2629.35..2664.60 rows=1 width=30) (actual time=318.926..377.019 rows=419 loops=1)
                        Merge Cond: (u."UserId" = t1."DoctorId")
                        Join Filter: (((dse_1.company_user_join_id IS NULL) AND (cuj.company_id IS NULL)) OR (dse_1.company_user_join_id = cuj.id))
                        Rows Removed by Join Filter: 26
                        Buffers: shared hit=383194
                        ->  Nested Loop Left Join  (cost=1363.46..1398.63 rows=1 width=30) (actual time=206.475..254.659 rows=419 loops=1)
                              Join Filter: (cuj.id = cujp.company_users_join_id)
                              Rows Removed by Join Filter: 26015
                              Filter: (($15 = 0) OR (cujp.provider_id = $16))
                              Buffers: shared hit=290878
                              ->  Nested Loop Left Join  (cost=1363.46..1395.58 rows=1 width=30) (actual time=206.460..249.574 rows=391 loops=1)
                                    Join Filter: (cuj.user_id = u."UserId")
                                    Rows Removed by Join Filter: 89798
                                    Filter: ((((dse.company_user_join_id IS NULL) AND (cuj.company_id IS NULL)) OR (dse.company_user_join_id = cuj.id)) AND (($13 = 0) OR (cuj.company_id = $14)))
                                    Rows Removed by Filter: 37
                                    Buffers: shared hit=290487
                                    ->  Nested Loop Left Join  (cost=1363.46..1384.83 rows=1 width=21) (actual time=206.445..226.437 rows=394 loops=1)
                                          Filter: (($17 = 0) OR (usp."SpecialtyId" = $18))
                                          Buffers: shared hit=289305
                                          ->  Nested Loop  (cost=1359.17..1375.28 rows=1 width=21) (actual time=206.418..225.252 rows=139 loops=1)
                                                Buffers: shared hit=288749
                                                ->  GroupAggregate  (cost=1358.88..1358.93 rows=1 width=20) (actual time=206.366..224.117 rows=147 loops=1)
                                                      Group Key: t."DoctorId", dse.company_user_join_id
                                                      Filter: ((COALESCE(count(*), '0'::bigint) >= 0) AND (((length(($23)::text) = 0) AND (COALESCE(count(*), '0'::bigint) > 0)) OR (length(($24)::text) > 0)))
                                                      Buffers: shared hit=288308
                                                      ->  Sort  (cost=1358.88..1358.89 rows=1 width=12) (actual time=205.791..212.762 rows=95524 loops=1)
                                                            Sort Key: t."DoctorId", dse.company_user_join_id
                                                            Sort Method: quicksort  Memory: 7550kB
                                                            Buffers: shared hit=288308
                                                            ->  Nested Loop  (cost=10.24..1358.87 rows=1 width=12) (actual time=3.708..181.475 rows=95524 loops=1)
                                                                  Buffers: shared hit=288308
                                                                  ->  Bitmap Heap Scan on "TimeSlots" t  (cost=9.96..1245.79 rows=16 width=16) (actual time=3.687..31.267 rows=95531 loops=1)
                                                                        Recheck Cond: (("SlotDateTime" > $1) AND ("SlotDateTime" < $2))
                                                                        Filter: (("DeleteDate" IS NULL) AND "IsActive" AND (($3 = 0) OR ("SpecialityId" = $4)))
                                                                        Rows Removed by Filter: 389
                                

Lau*_*lbe 3

问题是,在您的情况下,PostgreSQL 在前 5 次执行后切换到通用计划,并且通用计划的性能更差。特别是,对索引扫描的估计slotdatetime在通用计划中严重偏离。

PostgreSQL 有一个内置的启发式表单语句,其计划可以缓存:前 5 次它使用实际参数值来计划它们,然后如果该计划保证性能不会更差,则切换到通用计划。不幸的是,在这种情况下,通用计划并没有兑现承诺。

由于没有好的方法来修复通用计划,所以最好的办法就是避免它。您应该升级到 PostgreSQL v12 或更高版本,并在运行此查询时将参数设置plan_cache_mode为。force_custom_plan如果查询是在 PL/pgSQL 函数中执行的,您可以简单地

ALTER FUNCTION myfunc(argtype1,argtype2) SET plan_cache_mode = force_custom_plan;
Run Code Online (Sandbox Code Playgroud)

另一种方法是使用动态 SQL EXECUTE,它将始终使用自定义计划。