我有以下表和索引定义:
CREATE TABLE ticket
(
wid bigint NOT NULL DEFAULT nextval('tickets_id_seq'::regclass),
eid bigint,
created timestamp with time zone NOT NULL DEFAULT now(),
status integer NOT NULL DEFAULT 0,
argsxml text,
moduleid character varying(255),
source_id bigint,
file_type_id bigint,
file_name character varying(255),
status_reason character varying(255),
...
)
Run Code Online (Sandbox Code Playgroud)
我在created时间戳上创建了一个索引,如下所示:
CREATE INDEX ticket_1_idx
ON ticket
USING btree
(created );
Run Code Online (Sandbox Code Playgroud)
这是我的疑问
select * from ticket
where created between '2012-12-19 00:00:00' and '2012-12-20 00:00:00'
Run Code Online (Sandbox Code Playgroud)
这个工作正常,直到记录数量开始增长(约500万),现在它将永远回归.
解释分析揭示了这一点:
"Index Scan using ticket_1_idx on ticket (cost=0.00..10202.64 rows=52543 …Run Code Online (Sandbox Code Playgroud) postgresql indexing query-optimization database-partitioning postgresql-performance
我在表中有350万行acs_objects,我需要检索creation_date具有年份格式和不同的列.
我的第一次尝试:180~200 Sec (15 Rows Fetched)
SELECT DISTINCT to_char(creation_date,'YYYY') FROM acs_objects
Run Code Online (Sandbox Code Playgroud)
我的第二次尝试:35~40 Sec (15 Rows Fetched)
SELECT DISTINCT to_char(creation_date,'YYYY')
FROM (SELECT DISTINCT creation_date FROM acs_objects) AS distinct_date
Run Code Online (Sandbox Code Playgroud)
有没有办法让它更快? - "我需要在ADP网站上使用它"
table1如下表所示
+--------+-------+-------+------------+-------+
| flight | orig | dest | passenger | bags |
+--------+-------+-------+------------+-------+
| 1111 | sfo | chi | david | 3 |
| 1112 | sfo | dal | david | 7 |
| 1112 | sfo | dal | kim | 10|
| 1113 | lax | san | ameera | 5 |
| 1114 | lax | lfr | tim | 6 |
| 1114 | lax | lfr | jake | 8 | …Run Code Online (Sandbox Code Playgroud) 我想检索另一组点的给定范围内的所有点.比方说,找到距离任何地铁站500米范围内的所有商店.
我写了这个查询,这很慢,并且想要优化它:
SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500, false);
Run Code Online (Sandbox Code Playgroud)
我正在使用最新版本的Postgres和PostGis(Postgres 9.5,PostGis 2.2.1)
这是表元数据:
Table "public.locations"
Column | Type | Modifiers
--------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('locations_id_seq'::regclass)
coordinates | geometry |
Indexes:
"locations_coordinates_index" gist (coordinates)
Table "public.pois"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('pois_id_seq'::regclass)
coordinates | geometry |
poi_kind_id | integer |
Indexes:
"pois_pkey" PRIMARY KEY, btree (id)
"pois_coordinates_index" gist (coordinates)
"pois_poi_kind_id_index" …Run Code Online (Sandbox Code Playgroud) 我今天正在对一些慢速SQL查询进行故障排除,并且不太了解下面的性能差异:
当尝试max(timestamp)基于某些条件从数据表中提取时,使用MAX()比ORDER BY timestamp LIMIT 1匹配行存在时慢,但如果找不到匹配的行则相当快.
SELECT timestamp
FROM data JOIN sensors ON ( sensors.id = data.sensor_id )
WHERE sensor.station_id = 4
ORDER BY timestamp DESC
LIMIT 1;
(0 rows)
Time: 1314.544 ms
SELECT timestamp
FROM data JOIN sensors ON ( sensors.id = data.sensor_id )
WHERE sensor.station_id = 5
ORDER BY timestamp DESC
LIMIT 1;
(1 row)
Time: 10.890 ms
SELECT MAX(timestamp)
FROM data JOIN sensors ON ( sensors.id = data.sensor_id )
WHERE sensor.station_id …Run Code Online (Sandbox Code Playgroud) 这是我的表架构:
CREATE TABLE tickers (
product_id TEXT NOT NULL,
trade_id INT NOT NULL,
sequence BIGINT NOT NULL,
time TIMESTAMPTZ,
price NUMERIC NOT NULL,
side TEXT NOT NULL,
last_size NUMERIC NOT NULL,
best_bid NUMERIC NOT NULL,
best_ask NUMERIC NOT NULL,
PRIMARY KEY (product_id, trade_id)
);
Run Code Online (Sandbox Code Playgroud)
我的应用程序在“ticker”频道上订阅了 Coinbase Pro 的 websocket,并在收到消息时在行情表中插入一行。
该表现在有近 200 万行。
我认为运行SELECT DISTINCT product_id FROM tickers会很快,但它需要大约 500 到 600 毫秒。这是来自的输出EXPLAIN ANALYZE:
HashAggregate (cost=47938.97..47939.38 rows=40 width=8) (actual time=583.105..583.110 rows=40 loops=1)
Group Key: product_id
-> Seq Scan …Run Code Online (Sandbox Code Playgroud) sql postgresql query-optimization database-performance postgresql-performance
select *
from records
where id in ( select max(id) from records group by option_id )
Run Code Online (Sandbox Code Playgroud)
此查询即使在数百万行上也能正常工作.但是从解释声明的结果可以看出:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=30218.84..31781.62 rows=620158 width=44) (actual time=1439.251..1443.458 rows=1057 loops=1)
-> HashAggregate (cost=30218.41..30220.41 rows=200 width=4) (actual time=1439.203..1439.503 rows=1057 loops=1)
-> HashAggregate (cost=30196.72..30206.36 rows=964 width=8) (actual time=1438.523..1438.807 rows=1057 loops=1)
-> Seq Scan on records records_1 (cost=0.00..23995.15 rows=1240315 width=8) (actual time=0.103..527.914 rows=1240315 loops=1)
-> Index Scan using records_pkey on records (cost=0.43..7.80 rows=1 width=44) (actual time=0.002..0.003 rows=1 loops=1057)
Index Cond: (id = (max(records_1.id)))
Total …Run Code Online (Sandbox Code Playgroud) sql postgresql query-optimization greatest-n-per-group groupwise-maximum
我正在尝试在表(玩家)和视图(player_main_colors)之间进行简单的连接:
SELECT P.*, C.main_color FROM players P
OUTER LEFT JOIN player_main_colors C USING (player_id)
WHERE P.user_id=1;
Run Code Online (Sandbox Code Playgroud)
此查询大约需要40毫秒.
这里我在VIEW上使用嵌套的SELECT而不是JOIN:
SELECT player_id, main_color FROM player_main_colors
WHERE player_id IN (
SELECT player_id FROM players WHERE user_id=1);
Run Code Online (Sandbox Code Playgroud)
此查询也需要约40毫秒.
当我将查询分成2个部分时,它会像我预期的那样变快:
SELECT player_id FROM players WHERE user_id=1;
SELECT player_id, main_color FROM player_main_colors
where player_id in (584, 9337, 11669, 12096, 13651,
13852, 9575, 23388, 14339, 500, 24963, 25630,
8974, 13048, 11904, 10537, 20362, 9216, 4747, 25045);
Run Code Online (Sandbox Code Playgroud)
这些查询每个大约需要0.5毫秒.
那么为什么上面的查询与JOIN或子SELECT这么可怕的慢,我该如何修复呢?
以下是有关我的表格和视图的一些详细信息:
CREATE TABLE users (
user_id INTEGER PRIMARY KEY, …Run Code Online (Sandbox Code Playgroud) postgresql performance query-optimization greatest-n-per-group postgresql-performance
I have a pretty simple table
CREATE TABLE approved_posts (
project_id INTEGER,
feed_id INTEGER,
post_id INTEGER,
approved_time TIMESTAMP NOT NULL,
post_time TIMESTAMP NOT NULL,
PRIMARY KEY (project_id, feed_id, post_id)
)
Run Code Online (Sandbox Code Playgroud)
And I'm trying to optimize this query:
SELECT *
FROM approved_posts
WHERE feed_id IN (?, ?, ?)
AND project_id = ?
ORDER BY approved_time DESC, post_time DESC
LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
The query optimizer is fetching every single approved_post that matches the predicate, sorting all 100k results, and returning the top one …
我有这样一张桌子:
Name activity time
user1 A1 12:00
user1 E3 12:01
user1 A2 12:02
user2 A1 10:05
user2 A2 10:06
user2 A3 10:07
user2 M6 10:07
user2 B1 10:08
user3 A1 14:15
user3 B2 14:20
user3 D1 14:25
user3 D2 14:30
Run Code Online (Sandbox Code Playgroud)
现在,我需要这样的结果:
Name activity next_activity
user1 A2 NULL
user2 A3 B1
user3 A1 B2
Run Code Online (Sandbox Code Playgroud)
我想检查每个用户A组的最后一项活动以及接下来B组的活动类型(B组的活动总是在A组活动后进行).其他类型的活动对我来说并不感兴趣.我试过使用该lead()功能,但它没有奏效.
我怎么能解决我的问题?
postgresql ×10
sql ×8
aggregate ×3
indexing ×2
distinct ×1
max ×1
performance ×1
postgis ×1
sorting ×1
sql-limit ×1