Gar*_*ett 51 postgresql performance index optimization postgresql-performance
此查询获取您关注的人创建的帖子列表。您可以关注无限数量的人,但大多数人关注 < 1000 人。
使用这种查询方式,明显的优化是缓存"Post"id,但不幸的是我现在没有时间这样做。
EXPLAIN ANALYZE SELECT
"Post"."id",
"Post"."actionId",
"Post"."commentCount",
...
FROM
"Posts" AS "Post"
INNER JOIN "Users" AS "user" ON "Post"."userId" = "user"."id"
LEFT OUTER JOIN "ActivityLogs" AS "activityLog" ON "Post"."activityLogId" = "activityLog"."id"
LEFT OUTER JOIN "WeightLogs" AS "weightLog" ON "Post"."weightLogId" = "weightLog"."id"
LEFT OUTER JOIN "Workouts" AS "workout" ON "Post"."workoutId" = "workout"."id"
LEFT OUTER JOIN "WorkoutLogs" AS "workoutLog" ON "Post"."workoutLogId" = "workoutLog"."id"
LEFT OUTER JOIN "Workouts" AS "workoutLog.workout" ON "workoutLog"."workoutId" = "workoutLog.workout"."id"
WHERE
"Post"."userId" IN (
201486,
1825186,
998608,
340844,
271909,
308218,
341986,
216893,
1917226,
... -- many more
)
AND "Post"."private" IS NULL
ORDER BY
"Post"."createdAt" DESC
LIMIT 10;
Run Code Online (Sandbox Code Playgroud)
产量:
Limit (cost=3.01..4555.20 rows=10 width=2601) (actual time=7923.011..7973.138 rows=10 loops=1)
-> Nested Loop Left Join (cost=3.01..9019264.02 rows=19813 width=2601) (actual time=7923.010..7973.133 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.58..8935617.96 rows=19813 width=2376) (actual time=7922.995..7973.063 rows=10 loops=1)
-> Nested Loop Left Join (cost=2.15..8821537.89 rows=19813 width=2315) (actual time=7922.984..7961.868 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.71..8700662.11 rows=19813 width=2090) (actual time=7922.981..7961.846 rows=10 loops=1)
-> Nested Loop Left Join (cost=1.29..8610743.68 rows=19813 width=2021) (actual time=7922.977..7961.816 rows=10 loops=1)
-> Nested Loop (cost=0.86..8498351.81 rows=19813 width=1964) (actual time=7922.972..7960.723 rows=10 loops=1)
-> Index Scan using posts_createdat_public_index on "Posts" "Post" (cost=0.43..8366309.39 rows=20327 width=261) (actual time=7922.869..7960.509 rows=10 loops=1)
Filter: ("userId" = ANY ('{201486,1825186,998608,340844,271909,308218,341986,216893,1917226, ... many more ...}'::integer[]))
Rows Removed by Filter: 218360
-> Index Scan using "Users_pkey" on "Users" "user" (cost=0.43..6.49 rows=1 width=1703) (actual time=0.005..0.006 rows=1 loops=10)
Index Cond: (id = "Post"."userId")
-> Index Scan using "ActivityLogs_pkey" on "ActivityLogs" "activityLog" (cost=0.43..5.66 rows=1 width=57) (actual time=0.107..0.107 rows=0 loops=10)
Index Cond: ("Post"."activityLogId" = id)
-> Index Scan using "WeightLogs_pkey" on "WeightLogs" "weightLog" (cost=0.42..4.53 rows=1 width=69) (actual time=0.001..0.001 rows=0 loops=10)
Index Cond: ("Post"."weightLogId" = id)
-> Index Scan using "Workouts_pkey" on "Workouts" workout (cost=0.43..6.09 rows=1 width=225) (actual time=0.001..0.001 rows=0 loops=10)
Index Cond: ("Post"."workoutId" = id)
-> Index Scan using "WorkoutLogs_pkey" on "WorkoutLogs" "workoutLog" (cost=0.43..5.75 rows=1 width=61) (actual time=1.118..1.118 rows=0 loops=10)
Index Cond: ("Post"."workoutLogId" = id)
-> Index Scan using "Workouts_pkey" on "Workouts" "workoutLog.workout" (cost=0.43..4.21 rows=1 width=225) (actual time=0.004..0.004 rows=0 loops=10)
Index Cond: ("workoutLog"."workoutId" = id)
Total runtime: 7974.524 ms
Run Code Online (Sandbox Code Playgroud)
暂时如何优化?
我有以下相关索引:
-- Gets used
CREATE INDEX "posts_createdat_public_index" ON "public"."Posts" USING btree("createdAt" DESC) WHERE "private" IS null;
-- Don't get used
CREATE INDEX "posts_userid_fk_index" ON "public"."Posts" USING btree("userId");
CREATE INDEX "posts_following_index" ON "public"."Posts" USING btree("userId", "createdAt" DESC) WHERE "private" IS null;
Run Code Online (Sandbox Code Playgroud)
也许这需要一个大的部分复合索引createdAt和userId在哪里private IS NULL?
Cra*_*ger 49
不要使用巨大的IN-list,连接VALUES表达式,或者如果列表足够大,使用临时表,索引它,然后连接它。
如果 PostgreSQL 可以在内部和自动执行此操作,那就太好了,但此时规划器不知道如何执行。
类似话题:
Erw*_*ter 40
INPostgres中实际上有两种不同的构造变体。一个使用子查询表达式(返回一个set),另一个使用一个值列表,这只是
expression = value1
OR
expression = value2
OR
...
Run Code Online (Sandbox Code Playgroud)
您正在使用第二种形式,这对于短列表很好,但对于长列表要慢得多。提供您的值列表作为子查询表达式。我最近意识到了这个变种:
WHERE "Post"."userId" IN (VALUES (201486), (1825186), (998608), ... )
Run Code Online (Sandbox Code Playgroud)
我喜欢传递一个数组,取消嵌套并加入它。类似的性能,但语法更短:
...
FROM unnest('{201486,1825186,998608, ...}'::int[]) "userId"
JOIN "Posts" "Post" USING ("userId")
Run Code Online (Sandbox Code Playgroud)
只要提供的集合/数组中没有重复项就相等。否则,带有 a 的第二个表单JOIN返回重复的行,而第一个 withIN仅返回单个实例。这种细微的差异也会导致不同的查询计划。
显然,您需要在 上建立索引"Posts"."userId"。
对于很长的列表(数千个),请使用@Craig 建议的索引临时表。这允许对两个表进行组合位图索引扫描,只要每个数据页有多个元组要从磁盘获取,这通常会更快。
有关的:
旁白:您的命名约定不是很有帮助,使您的代码冗长且难以阅读。而是使用合法的、小写的、不带引号的标识符。
| 归档时间: |
|
| 查看次数: |
66639 次 |
| 最近记录: |