与没有函数包装器的查询相比,SQL函数非常慢

Mme*_*yer 8 postgresql function sql-execution-plan postgresql-performance

我有这个PostgreSQL 9.4查询运行速度非常快(~12ms):

SELECT 
  auth_web_events.id, 
  auth_web_events.time_stamp, 
  auth_web_events.description, 
  auth_web_events.origin,  
  auth_user.email, 
  customers.name,
  auth_web_events.client_ip
FROM 
  public.auth_web_events, 
  public.auth_user, 
  public.customers
WHERE 
  auth_web_events.user_id_fk = auth_user.id AND
  auth_user.customer_id_fk = customers.id AND
  auth_web_events.user_id_fk = 2
ORDER BY
  auth_web_events.id DESC;
Run Code Online (Sandbox Code Playgroud)

但是,如果我将它嵌入到一个函数中,查询在所有数据中运行速度非常慢,似乎是在运行每条记录,我缺少什么?,我有〜1M的数据,我想简化我的数据库层存储大型查询进入功能和观点.

CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
    id int,
    time_stamp timestamp with time zone,
    description text,
    origin text,
    userlogin text,
    customer text,
    client_ip inet
     ) AS
$func$
SELECT 
  auth_web_events.id, 
  auth_web_events.time_stamp, 
  auth_web_events.description, 
  auth_web_events.origin,  
  auth_user.email AS user, 
  customers.name AS customer,
  auth_web_events.client_ip
FROM 
  public.auth_web_events, 
  public.auth_user, 
  public.customers
WHERE 
  auth_web_events.user_id_fk = auth_user.id AND
  auth_user.customer_id_fk = customers.id AND
  auth_web_events.user_id_fk = $1
ORDER BY
  auth_web_events.id DESC;
  $func$ LANGUAGE SQL;
Run Code Online (Sandbox Code Playgroud)

查询计划是:

"Sort  (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
"  Sort Key: auth_web_events.id"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  Nested Loop  (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
"        ->  Nested Loop  (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
"              ->  Index Scan using auth_web_events_fk1 on auth_web_events  (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
"                    Index Cond: (user_id_fk = 2)"
"              ->  Index Scan using auth_user_pkey on auth_user  (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
"                    Index Cond: (id = 2)"
"        ->  Index Scan using customers_id_idx on customers  (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
"              Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"
Run Code Online (Sandbox Code Playgroud)

我这样称呼这个函数:

SELECT * from get_web_events_by_userid(2)  
Run Code Online (Sandbox Code Playgroud)

该函数的查询计划:

"Function Scan on get_web_events_by_userid  (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"
Run Code Online (Sandbox Code Playgroud)

编辑:我只是更改参数,问题仍然存在.
EDIT2:Erwin答案的查询计划:

"Sort  (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
"  Sort Key: w.id"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  Nested Loop  (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
"        ->  Nested Loop  (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
"              ->  Index Scan using auth_user_pkey on auth_user u  (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
"                    Index Cond: (id = 2)"
"              ->  Index Scan using auth_web_events_fk1 on auth_web_events w  (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
"                    Index Cond: (user_id_fk = 2)"
"        ->  Index Scan using customers_id_idx on customers c  (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
"              Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 7

user

在重写函数时,我意识到你在这里添加了列别名:

SELECT 
  ...
  auth_user.email AS user, 
  customers.name AS customer,
Run Code Online (Sandbox Code Playgroud)

.. 因为这些别名在函数外部是不可见的而在函数内部没有引用,所以它不会做任何事情.所以他们会被忽略.出于文档目的,最好使用注释.

但它也会使您的查询无效,因为它user是一个完全保留的单词,除非双引号,否则不能用作列别名.

奇怪的是,在我的测试中,该函数似乎与无效的别名一起使用.可能是因为它被忽略了(?).但我不确定这不会产生副作用.

你的函数被重写(否则相当于):

CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
  RETURNS TABLE(
     id int
   , time_stamp timestamptz
   , description text
   , origin text
   , userlogin text
   , customer text
   , client_ip inet
  ) AS
$func$
SELECT w.id
     , w.time_stamp
     , w.description 
     , w.origin  
     , u.email     -- AS user   -- make this a comment!
     , c.name      -- AS customer
     , w.client_ip
FROM   public.auth_user       u
JOIN   public.auth_web_events w ON w.user_id_fk = u.id
JOIN   public.customers       c ON c.id = u.customer_id_fk 
WHERE  u.id = $1   -- reverted the logic here
ORDER  BY w.id DESC
$func$ LANGUAGE sql STABLE;
Run Code Online (Sandbox Code Playgroud)

显然,STABLE关键字改变了结果.在您描述的测试情况中,函数波动不应成为问题.该设置通常不会使单个隔离的函数调用受益.阅读手册中的详细信息.此外,标准EXPLAIN不显示内部函数内容的查询计划.您可以使用附加模块自动解释:

你有一个非常奇怪的数据分布:

auth_web_events表有100000000条记录,auth_user-> 2条记录,customers-> 1条记录

由于您没有另外定义,因此该函数假定要返回1000行的估计值.但是你的函数实际上只返回2行.如果你的所有通话只返回(在2行附近),只需要添加一个声明ROWS 2.也可以更改变体的查询计划VOLATILE(即使这STABLE是正确的选择).

  • @ErwinBrandstetter只是一个FYI,但我在研究一个非常类似的问题时发现了这个问题和答案.我有一个大约91毫秒运行的查询,当我把它放在一个函数中时,它跳到超过4,900毫秒.添加`STABLE`使其执行类似于原始SQL. (2认同)