为什么集合返回函数 (SRF) 在 FROM 子句中运行得更慢？

Question

为什么集合返回函数 (SRF) 在 FROM 子句中运行得更慢？

Eva*_*oll 8 postgresql performance database-internals functions set-returning-functions

这是一个数据库内部问题。我正在使用 PostgreSQL 9.5，我想知道为什么设置返回函数 (SRF)，也称为表值函数 (TVF) 在FROM子句中运行得更慢，例如当我执行这些命令时，

CREATE TABLE foo AS SELECT * FROM generate_series(1,1e7);
SELECT 10000000
Time: 5573.574 ms

Run Code Online (Sandbox Code Playgroud)

它总是比，慢得多

CREATE TABLE foo AS SELECT generate_series(1,1e7);
SELECT 10000000
Time: 4622.567 ms

Run Code Online (Sandbox Code Playgroud)

是否有一个通用规则可以在这里制定，以便我们应该始终在FROM子句之外运行 Set-Returning Functions ？

Answer 1

小智 13

让我们从比较执行计划开始：

tinker=> EXPLAIN ANALYZE SELECT * FROM generate_series(1,1e7);
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=32) (actual time=2382.582..4291.136 rows=10000000 loops=1)
 Planning time: 0.022 ms
 Execution time: 5539.522 ms
(3 rows)

tinker=> EXPLAIN ANALYZE SELECT generate_series(1,1e7);
                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.008..2622.365 rows=10000000 loops=1)
 Planning time: 0.045 ms
 Execution time: 3858.661 ms
(3 rows)

Run Code Online (Sandbox Code Playgroud)

好的，现在我们知道SELECT * FROM generate_series()使用Function Scan节点SELECT generate_series()执行，而使用Result节点执行。导致这些查询执行不同的原因归结为这两个节点之间的差异，我们确切地知道去哪里查找。

EXPLAIN ANALYZE输出中的另一件有趣的事情：注意时间。SELECT generate_series()是actual time=0.008..2622.365，而SELECT * FROM generate_series()是actual time=2382.582..4291.136。该Function Scan节点开始返回各地的时间记录Result节点完成返回的记录。

PostgreSQL在计划之间t=0和计划中做t=2382了Function Scan什么？显然这是关于 run 需要多长时间generate_series()，所以我敢打赌这正是它在做什么。答案开始形成：似乎是Result立即返回结果，而似乎是Function Scan将结果具体化然后扫描它们。

随着EXPLAIN闪开，让我们检查的实施。该Result节点位于nodeResult.c，其中表示：

 * DESCRIPTION
 *
 *      Result nodes are used in queries where no relations are scanned.

Run Code Online (Sandbox Code Playgroud)

代码很简单。

Function Scan住在中nodeFunctionScan.c，实际上它似乎采用了两阶段执行策略：

/*
 * If first time through, read all tuples from function and put them
 * in a tuplestore. Subsequent calls just fetch tuples from
 * tuplestore.
 */

Run Code Online (Sandbox Code Playgroud)

为了清楚起见，让我们看看atuplestore是什么：

 * tuplestore.h
 *    Generalized routines for temporary tuple storage.
 *
 * This module handles temporary storage of tuples for purposes such
 * as Materialize nodes, hashjoin batch files, etc.  It is essentially
 * a dumbed-down version of tuplesort.c; it does no sorting of tuples
 * but can only store and regurgitate a sequence of tuples.  However,
 * because no sort is required, it is allowed to start reading the sequence
 * before it has all been written.  This is particularly useful for cursors,
 * because it allows random access within the already-scanned portion of
 * a query without having to process the underlying scan to completion.
 * Also, it is possible to support multiple independent read pointers.
 *
 * A temporary file is used to handle the data if it exceeds the
 * space limit specified by the caller.

Run Code Online (Sandbox Code Playgroud)

假设得到证实。Function Scan预先执行，具体化函数的结果，这对于大型结果集会导致溢出到磁盘。Result不具体化任何东西，但也只支持琐碎的操作。

归档时间：	7 年，6 月前
查看次数：	483 次
最近记录：	7 年，2 月前