D-N*_*ice 6 sql database postgresql activerecord ruby-on-rails
我有一个 ActiveRecord 查询,它使用 OR 运算符将 2 个查询链接在一起。结果恢复正常,但执行组合查询的速度是单独执行 2 个查询之一的速度的 10 倍左右。
我们有一个Event模型和一个Invitation模型。AUser可以Event通过邀请过滤器作为目标被邀请,或者通过Invitation记录被单独邀请。
因此,在确定有多少用户受邀参加特定活动时,我们必须查看Invitations所有符合筛选条件的用户以及所有符合筛选条件的用户。我们在这里这样做:
@invited_count = @invited_by_individual.or(@invited_by_filter).distinct.count(:id)
重要的是要注意,两者@invited_by_individual和@invited_by_filter关系都包含references和includes陈述。
现在,问题是当我们执行该查询时,大约需要 1200 毫秒。如果我们单独进行查询,每个查询只需要大约 80 毫秒。因此@invited_by_filter.distinct.count ,@invited_by_individual.distinct.count两者都在大约 80 毫秒内返回结果,但它们本身都不是完整的。
有什么办法可以用 OR 运算符加快查询速度吗?为什么会发生这种情况?
这是 ActiveRecord 查询生成的 SQL:
快速、单一的查询:
(79.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE "invitations"."event_id" = $1 [["event_id", 732]]
Run Code Online (Sandbox Code Playgroud)
慢,结合查询:
(1220.7ms)
SELECT COUNT(DISTINCT "users"."id")
FROM "users"
LEFT OUTER JOIN "invitations"
ON "invitations"."user_id" = "users"."id"
WHERE ("invitations"."event_id" = $1 OR "users"."organization_id" = $2) [["event_id", 732], ["organization_id", 13]]
Run Code Online (Sandbox Code Playgroud)
更新,这里是解释:
(1418.2ms) SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
=>
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=121781.56..121781.57 rows=1 width=8)
-> Hash Right Join (cost=113248.88..121778.64 rows=1165 width=8)
Hash Cond: (invitations.user_id = users.id)
Filter: ((users.root_organization_id = '-1'::integer) OR (invitations.event_id = 749))
-> Seq Scan on invitations (cost=0.00..1299.70 rows=63470 width=8)
-> Hash (cost=93513.28..93513.28 rows=1135328 width=12)
-> Seq Scan on users (cost=0.00..93513.28 rows=1135328 width=12)
(7 rows)
Run Code Online (Sandbox Code Playgroud)
更新 2,针对单独运行的查询的解释,确实使用了索引:
(91.5ms) SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
=>
EXPLAIN for: SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]
#=> QUERY PLAN
Aggregate (cost=19.05..19.06 rows=1 width=8)
-> Nested Loop (cost=0.72..19.05 rows=1 width=0)
-> Index Scan using index_users_on_root_organization_id on users (cost=0.43..4.45 rows=1 width=8)
Index Cond: (root_organization_id = '-1'::integer)
-> Index Only Scan using index_invitations_on_user_id on invitations (cost=0.29..14.57 rows=3 width=4)
Index Cond: (user_id = users.id)
(6 rows)
Run Code Online (Sandbox Code Playgroud)
和
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "invitations"."event_id" = $1 [["event_id", 749]]
#=> QUERY PLAN
Aggregate (cost=536.34..536.35 rows=1 width=8)
-> Nested Loop (cost=0.72..536.19 rows=62 width=8)
-> Index Scan using index_invitations_on_event_id on invitations (cost=0.29..11.98 rows=62 width=4)
Index Cond: (event_id = 749)
-> Index Only Scan using users_pkey on users (cost=0.43..8.45 rows=1 width=8)
Index Cond: (id = invitations.user_id)
(6 rows)
Run Code Online (Sandbox Code Playgroud)
UNION使您能够利用两个索引,同时仍然防止重复。
User.from(
"(#{@invited_by_individual.to_sql}
UNION
#{@invited_by_filter.to_sql})"
).count
Run Code Online (Sandbox Code Playgroud)