Mla*_*vić 14 sql postgresql indexing performance union-all
我有一个DB视图,它基本上由两个SELECT查询组成UNION ALL,如下所示:
CREATE VIEW v AS
SELECT time, etc. FROM t1 // #1...
UNION ALL
SELECT time, etc. FROM t2 // #2...
Run Code Online (Sandbox Code Playgroud)
问题是选择表格
SELECT ... FROM v WHERE time >= ... AND time < ...
Run Code Online (Sandbox Code Playgroud)
表现真的很慢.
SELECT#1和#2都非常快,索引正确等等:当我创建视图v1和v2时:
CREATE VIEW v1 AS
SELECT time, etc. FROM t1 // #1...
CREATE VIEW v2 AS
SELECT time, etc. FROM t2 // #2...
Run Code Online (Sandbox Code Playgroud)
并且具有与上述相同的WHERE条件的相同SELECT单独工作正常.
关于哪里可能是问题以及如何解决问题的任何想法?
(仅举这一点,它是最近的Postgres版本之一.)
编辑:添加匿名查询计划(请访问@filiprem以获取指向真棒工具的链接):
V1:
Aggregate (cost=9825.510..9825.520 rows=1 width=53) (actual time=59.995..59.995 rows=1 loops=1)
-> Index Scan using delta on echo alpha (cost=0.000..9815.880 rows=3850 width=53) (actual time=0.039..53.418 rows=33122 loops=1)
Index Cond: (("juliet" >= 'seven'::uniform bravo_victor oscar whiskey) AND ("juliet" <= 'november'::uniform bravo_victor oscar whiskey))
Filter: ((NOT victor) AND ((bravo_sierra five NULL) OR ((bravo_sierra)::golf <> 'india'::golf)))
Run Code Online (Sandbox Code Playgroud)
V2:
Aggregate (cost=15.470..15.480 rows=1 width=33) (actual time=0.231..0.231 rows=1 loops=1)
-> Index Scan using yankee on six charlie (cost=0.000..15.220 rows=99 width=33) (actual time=0.035..0.186 rows=140 loops=1)
Index Cond: (("juliet" >= 'seven'::uniform bravo oscar whiskey) AND ("juliet" <= 'november'::uniform bravo oscar whiskey))
Filter: (NOT victor)
Run Code Online (Sandbox Code Playgroud)
五:
Aggregate (cost=47181.850..47181.860 rows=1 width=0) (actual time=37317.291..37317.291 rows=1 loops=1)
-> Append (cost=42.170..47132.480 rows=3949 width=97) (actual time=1.277..37304.453 rows=33262 loops=1)
-> Nested Loop Left Join (cost=42.170..47052.250 rows=3850 width=99) (actual time=1.275..37288.465 rows=33122 loops=1)
-> Hash Left Join (cost=42.170..9910.990 rows=3850 width=115) (actual time=1.123..117.797 rows=33122 loops=1)
Hash Cond: ((alpha_seven.two)::golf = (quebec_three.two)::golf)
-> Index Scan using delta on echo alpha_seven (cost=0.000..9815.880 rows=3850 width=132) (actual time=0.038..77.866 rows=33122 loops=1)
Index Cond: (("juliet" >= 'seven'::uniform bravo_victor oscar whiskey_two) AND ("juliet" <= 'november'::uniform bravo_victor oscar whiskey_two))
Filter: ((NOT victor) AND ((bravo_sierra five NULL) OR ((bravo_sierra)::golf <> 'india'::golf)))
-> Hash (cost=30.410..30.410 rows=941 width=49) (actual time=1.068..1.068 rows=941 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 75kB
-> Seq Scan on alpha_india quebec_three (cost=0.000..30.410 rows=941 width=49) (actual time=0.010..0.486 rows=941 loops=1)
-> Index Scan using mike on hotel quebec_sierra (cost=0.000..9.630 rows=1 width=24) (actual time=1.112..1.119 rows=1 loops=33122)
Index Cond: ((alpha_seven.zulu)::golf = (quebec_sierra.zulu)::golf)
-> Subquery Scan on "*SELECT* 2" (cost=34.080..41.730 rows=99 width=38) (actual time=1.081..1.951 rows=140 loops=1)
-> Merge Right Join (cost=34.080..40.740 rows=99 width=38) (actual time=1.080..1.872 rows=140 loops=1)
Merge Cond: ((quebec_three.two)::golf = (charlie.two)::golf)
-> Index Scan using whiskey_golf on alpha_india quebec_three (cost=0.000..174.220 rows=941 width=49) (actual time=0.017..0.122 rows=105 loops=1)
-> Sort (cost=18.500..18.750 rows=99 width=55) (actual time=0.915..0.952 rows=140 loops=1)
Sort Key: charlie.two
Sort Method: quicksort Memory: 44kB
-> Index Scan using yankee on six charlie (cost=0.000..15.220 rows=99 width=55) (actual time=0.022..0.175 rows=140 loops=1)
Index Cond: (("juliet" >= 'seven'::uniform bravo_victor oscar whiskey_two) AND ("juliet" <= 'november'::uniform bravo_victor oscar whiskey_two))
Filter: (NOT victor)
Run Code Online (Sandbox Code Playgroud)
juliet是time.
这似乎是飞行员错误的情况."v"查询计划从至少5个不同的表中进行选择.
现在,您确定已连接到正确的数据库吗?也许有一些时髦的search_path设置?也许t1和t2实际上是视图(可能在不同的模式中)?也许你是以某种方式从错误的观点中选择?
澄清后编辑:
您正在使用一个名为"join removal"的全新功能:http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.0#Join_Removal
http://rhaas.blogspot.com/2010/06/why-join-removal-is-cool.html
当涉及union all时,该功能似乎没有启动.您可能只需要使用所需的两个表重写视图.
另一个编辑:您似乎正在使用聚合(例如"从v中选择计数(*)"与"select*from v"),这可能会在连接删除时获得截然不同的计划.我想如果没有发布实际查询,视图和表格定义以及使用的计划,我们将不会走得太远......
我相信您的查询正在执行类似于:
(
( SELECT time, etc. FROM t1 // #1... )
UNION ALL
( SELECT time, etc. FROM t2 // #2... )
)
WHERE time >= ... AND time < ...
Run Code Online (Sandbox Code Playgroud)
哪个优化器难以优化.即它做的UNION ALL第一应用之前WHERE条款,但是,你希望它适用WHERE条款之前的UNION ALL.
你不能把你的WHERE条款放在CREATE VIEW?
CREATE VIEW v AS
( SELECT time, etc. FROM t1 WHERE time >= ... AND time < ... )
UNION ALL
( SELECT time, etc. FROM t2 WHERE time >= ... AND time < ... )
Run Code Online (Sandbox Code Playgroud)
或者,如果视图不能包含该WHERE子句,那么,也许您可以保留两个视图并在需要时UNION ALL使用该WHERE子句:
CREATE VIEW v1 AS
SELECT time, etc. FROM t1 // #1...
CREATE VIEW v2 AS
SELECT time, etc. FROM t2 // #2...
( SELECT * FROM v1 WHERE time >= ... AND time < ... )
UNION ALL
( SELECT * FROM v2 WHERE time >= ... AND time < ... )
Run Code Online (Sandbox Code Playgroud)
bja*_*jan -3
我认为我没有太多观点可以将其作为评论发布,所以我将其作为答案发布
我不知道 PostgreSQL 在幕后是如何工作的,我想你可能会知道它是否是 Oracle,所以这就是 Oracle 的工作原理
您的UNION ALL视图速度较慢,因为在幕后,来自SELECT #1和#2的记录首先组合在临时表中,该临时表是动态创建的,然后是您的SELECT ... FROM v WHERE time >= 。 .. AND time < ...在此临时表上执行。由于#1和#2都已建立索引,因此它们按预期单独工作得更快,但该临时表没有建立索引(当然),并且从该临时表中选择最终记录,因此导致响应速度较慢。
现在,至少,我没有看到任何方法可以让它更快+视图+非物化
除了显式运行SELECT #1和#2并 UNION 它们之外,为了使其更快,一种方法是使用应用程序编程语言中的存储过程或函数(如果是这种情况),在这个过程中,您可以单独调用每个索引表,然后合并结果,这并不像SELECT ... FROM v WHERE time >= ... AND time < ... :(