在Postgres中为比赛存储'等级'

Bro*_*ood 5 sql postgresql correlated-subquery sql-update postgresql-performance

我正在尝试确定以下查询是否存在"低成本"优化.我们已经实施了一个系统,"门票"可以获得"积分",因此可以进行排名.为了支持分析类型的查询,我们将每个票证的等级(票证可以绑定)与票证一起存储.

我发现,在规模上,更新此排名非常缓慢.我试图在一组大约20k门票的"门票"上运行下面的场景.

我希望有人可以帮助确定原因并提供一些帮助.

我们在postgres 9.3.6

这是一个简化的票证表架构:

ogs_1=> \d api_ticket
                                             Table "public.api_ticket"
            Column            |           Type           |                        Modifiers                        
------------------------------+--------------------------+---------------------------------------------------------
 id                           | integer                  | not null default nextval('api_ticket_id_seq'::regclass)
 status                       | character varying(3)     | not null
 points_earned                | integer                  | not null
 rank                         | integer                  | not null
 event_id                     | integer                  | not null
 user_id                      | integer                  | not null
Indexes:
    "api_ticket_pkey" PRIMARY KEY, btree (id)
    "api_ticket_4437cfac" btree (event_id)
    "api_ticket_e8701ad4" btree (user_id)
    "api_ticket_points_earned_idx" btree (points_earned)
    "api_ticket_rank_idx" btree ("rank")
Foreign-key constraints:
    "api_ticket_event_id_598c97289edc0e3e_fk_api_event_id" FOREIGN KEY (event_id) REFERENCES api_event(id) DEFERRABLE INITIALLY DEFERRED
(user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Run Code Online (Sandbox Code Playgroud)

这是我正在执行的查询:

UPDATE api_ticket t SET rank = (
  SELECT rank
  FROM (SELECT Rank() over (
      Partition BY event_id ORDER BY points_earned DESC
    ) as rank, id
    FROM api_ticket tt
    WHERE event_id = t.event_id
      AND tt.status != 'x'
  ) as r
  WHERE r.id = t.id
)
WHERE event_id = <EVENT_ID> AND t.status != 'x';
Run Code Online (Sandbox Code Playgroud)

这是一组大约10k行的解释:

Update on api_ticket t  (cost=0.00..1852176.70 rows=9646 width=88) (actual time=1254035.623..1254035.623 rows=0 loops=1)
   ->  Seq Scan on api_ticket t  (cost=0.00..1852176.70 rows=9646 width=88) (actual time=121.611..1253148.416 rows=9748 loops=1)
         Filter: (((status)::text <> 'x'::text) AND (event_id = 207))
         Rows Removed by Filter: 10
         SubPlan 1
           ->  Subquery Scan on r  (cost=159.78..191.97 rows=1 width=8) (actual time=87.466..128.537 rows=1 loops=9748)
                 Filter: (r.id = t.id)
                 Rows Removed by Filter: 9747
                 ->  WindowAgg  (cost=159.78..178.55 rows=1073 width=12) (actual time=46.389..108.954 rows=9748 loops=9748)
                       ->  Sort  (cost=159.78..162.46 rows=1073 width=12) (actual time=46.370..66.163 rows=9748 loops=9748)
                             Sort Key: tt.points_earned
                             Sort Method: quicksort  Memory: 799kB
                             ->  Index Scan using api_ticket_4437cfac on api_ticket tt  (cost=0.29..105.77 rows=1073 width=12) (actual time=2.698..26.448 rows=9748 loops=9748)
                                   Index Cond: (event_id = t.event_id)
                                   Filter: ((status)::text <> 'x'::text)
 Total runtime: 1254036.583 ms
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 5

必须对每一行执行相关子查询(在您的示例中为 20k 次)。这只对少量行或计算需要的地方有意义。

在我们连接到派生表之前,该派生表会计算一次:

UPDATE api_ticket t
SET    rank = tt.rnk
FROM  (
   SELECT tt.id
        , rank() OVER (PARTITION BY tt.event_id
                       ORDER BY tt.points_earned DESC) AS rnk
   FROM   api_ticket tt
   WHERE  tt.status <> 'x'
   AND    tt.event_id = <EVENT_ID>
   ) tt
WHERE t.id = tt.id
AND   t.rank <> tt.rnk;  -- avoid empty updates
Run Code Online (Sandbox Code Playgroud)

应该会快一些。:)

其他改进

最后一个谓词排除空更新:

只有当新等级至少偶尔可以是旧等级时才有意义。否则将其删除。

我们不需要AND t.status != 'x'在外部查询中重复,因为我们在 PK 列上连接,id两边的值相同。
标准的 SQL 不等式运算符是<>,即使 Postgres!=也支持 。

将谓词event_id = <EVENT_ID>也下推到子查询中。无需计算任何其他的数字event_id。这是从您原来的外部查询中继承下来的。在重写的查询中,我们最好将其完全应用在子查询中。由于我们使用PARTITION BY tt.event_id,这不会扰乱排名。