tur*_*nip 5 postgresql performance cte query-performance
我有一个有点复杂的查询,它拆分字符串并将每个单词作为记录输出。
我做了一个快速测试,一个有 CTE,一个有子查询,看到 CTE 的执行时间是原来的两倍,我有点惊讶。
以下是查询功能的要点:
-- 1. translate matches characters from comment to given list (of symbols) and replaces them with commas.
-- 2. string_to_array splits string by comma and puts in an array
-- 3. unnest unpacks the array into rows
Run Code Online (Sandbox Code Playgroud)
SELECT
sub_query.word,
sub_query._created_at
FROM
( SELECT unnest(string_to_array(translate(nps_reports.comment::text, ' ,.<>?/;:@#~[{]}=+-_)("*&^%$£!`\|}'::text, ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'::text), ','::text, ''::text)) AS word,
nps_reports.comment,
nps_reports._id,
nps_reports._created_at
FROM nps_reports
WHERE nps_reports.comment::text <> 'undefined'::text
) sub_query
WHERE sub_query.word IS NOT NULL AND NOT (sub_query.word IN ( SELECT stop_words.stop_word FROM stop_words))
ORDER BY sub_query._created_at DESC;
Run Code Online (Sandbox Code Playgroud)
WITH split AS
(
SELECT unnest(string_to_array(translate(nps_reports.comment::text, ' ,.<>?/;:@#~[{]}=+-_)("*&^%$£!`\|}'::text, ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'::text), ','::text, ''::text)) AS word,
nps_reports.comment,
nps_reports._id,
nps_reports._created_at
FROM nps_reports
WHERE nps_reports.comment::text <> 'undefined'::text
)
SELECT
split.word,
split._created_at
FROM split
WHERE split.word IS NOT NULL AND NOT (split.word IN ( SELECT stop_words.stop_word FROM stop_words))
ORDER BY split._created_at DESC;
Run Code Online (Sandbox Code Playgroud)
这是每个的解释:
Sort (cost=15921589.76..16082302.91 rows=64285258 width=40) (actual time=16299.150..17697.914 rows=4394788 loops=1)
Sort Key: sub_query._created_at DESC
Sort Method: external merge Disk: 116112kB
Buffers: shared hit=22915 read=7627, temp read=34281 written=34281
-> Subquery Scan on sub_query (cost=2.49..2311035.10 rows=64285258 width=40) (actual time=0.177..13274.895 rows=4394788 loops=1)
Filter: ((sub_query.word IS NOT NULL) AND (NOT (hashed SubPlan 1)))
Rows Removed by Filter: 3676303
Buffers: shared hit=22915 read=7627
-> Seq Scan on nps_reports (cost=0.00..695825.11 rows=129216600 width=88) (actual time=0.073..9781.244 rows=8071091 loops=1)
Filter: ((comment)::text <> 'undefined'::text)
Rows Removed by Filter: 844360
Buffers: shared hit=22914 read=7627
SubPlan 1
-> Seq Scan on stop_words (cost=0.00..2.19 rows=119 width=4) (actual time=0.016..0.034 rows=119 loops=1)
Buffers: shared hit=1
Planning time: 0.115 ms
Execution time: 18451.245 ms
Run Code Online (Sandbox Code Playgroud)
Sort (cost=17213755.76..17374468.91 rows=64285258 width=40) (actual time=44008.467..45508.786 rows=4394788 loops=1)
Sort Key: split._created_at DESC
Sort Method: external merge Disk: 116112kB
Buffers: shared hit=23031 read=7531, temp read=34281 written=353942
CTE split
-> Seq Scan on nps_reports (cost=0.00..695825.11 rows=129216600 width=135) (actual time=0.057..10451.951 rows=8071091 loops=1)
Filter: ((comment)::text <> 'undefined'::text)
Rows Removed by Filter: 844360
Buffers: shared hit=23027 read=7531
-> CTE Scan on split (cost=2.49..2907375.99 rows=64285258 width=40) (actual time=0.162..37888.364 rows=4394788 loops=1)
Filter: ((word IS NOT NULL) AND (NOT (hashed SubPlan 2)))
Rows Removed by Filter: 3676303
Buffers: shared hit=23028 read=7531, temp written=319661
SubPlan 2
-> Seq Scan on stop_words (cost=0.00..2.19 rows=119 width=4) (actual time=0.009..0.030 rows=119 loops=1)
Buffers: shared hit=1
Planning time: 0.649 ms
Execution time: 46297.825 ms
Run Code Online (Sandbox Code Playgroud)
PostgreSQL 中的 CTE 是一个优化栅栏。这意味着查询规划器不会跨 CTE 边界推动优化。
我认为很多这都是愚蠢的,尽管你可以像这样写......这里我们使用CROSS JOIN LATERAL而不是复杂的包装NOT EXISTS而不是NOT IN
SELECT word,
_created_at
FROM nps_reports
CROSS JOIN LATERAL unnest(regexp_split_to_array(
nps_reports.comment,
'[^a-zA-Z0-9]+'
)) AS word
WHERE nps_reports.comment <> 'undefined'
AND nps_reports.comment IS NOT NULL
AND NOT EXISTS (
SELECT 1
FROM stop_words
WHERE stop_words.stop_word = word
)
ORDER BY _created_at DESC;
Run Code Online (Sandbox Code Playgroud)
综上所述,无论您在做什么,似乎都在重塑 FTS。所以这也是一个坏主意。
| 归档时间: |
|
| 查看次数: |
5222 次 |
| 最近记录: |