为什么行级安全性 (RLS) 不使用索引?

The*_*off 5 postgresql performance row-level-security

我有一份关于患者和治疗师的申请。他们都在同users一张桌子上。患者应该能够看到他们的治疗师,治疗师应该能够看到他们的患者。

我已经建立了一个user_access_pairs具有成对用户 ID的物化视图 ( ),如果两个用户在视图中有一行,那么这意味着他们应该可以互相访问。

database> \d user_access_pairs
+----------+---------+-------------+
| Column   | Type    | Modifiers   |
|----------+---------+-------------|
| id1      | integer |             |
| id2      | integer |             |
+----------+---------+-------------+
Indexes:
    "index_user_access_pairs" UNIQUE, btree (id1, id2)
Run Code Online (Sandbox Code Playgroud)

这是users表格的定义,它有很多与这个问题无关的列。

database> \d users
+-----------------------------+-----------------------------+-----------------------------------------------------+
| Column                      | Type                        | Modifiers                                           |
|-----------------------------+-----------------------------+-----------------------------------------------------|
| id                          | integer                     |  not null default nextval('users_id_seq'::regclass) |
| first_name                  | character varying(255)      |                                                     |
| last_name                   | character varying(255)      |                                                     |
+-----------------------------+-----------------------------+-----------------------------------------------------+
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)

Run Code Online (Sandbox Code Playgroud)

我创建了一个 RLS 策略,该策略限制users了使用 jwt 令牌的人可以读取的内容。

create policy select_users_policy
  on public.users
  for select using (
    (current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
      select id1, id2 from user_access_pairs
    )
  );
Run Code Online (Sandbox Code Playgroud)

这似乎合乎逻辑,但我的表现很糟糕。user_access_pairs尽管那里有索引,查询计划器还是会执行顺序扫描。

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
    select first_name, last_name
    from users
+------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                         |
|------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users  (cost=231.84..547.19 rows=2386 width=14) (actual time=5.481..6.418 rows=2 loops=1)                       |
|   Output: users.first_name, users.last_name                                                                                        |
|   Filter: (hashed SubPlan 1)                                                                                                       |
|   Rows Removed by Filter: 4769                                                                                                     |
|   SubPlan 1                                                                                                                        |
|     ->  Seq Scan on public.user_access_pairs  (cost=0.00..197.67 rows=13667 width=8) (actual time=0.005..1.107 rows=13667 loops=1) |
|           Output: user_access_pairs.id1, user_access_pairs.id2                                                                     |
| Planning Time: 0.072 ms                                                                                                            |
| Execution Time: 6.521 ms                                                                                                           |
+------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

但是,如果我切换到绕过 RLS 的超级用户角色并手动应用相同的过滤器,我会获得更好的性能。不应该是一样的吗?

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
   select first_name, last_name
   from users
   where (current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
     select id1, id2 from user_access_pairs
   )
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| QUERY PLAN
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Nested Loop  (cost=4.59..27.86 rows=2 width=14) (actual time=0.041..0.057 rows=2 loops=1)
|   Output: users.first_name, users.last_name
|   Inner Unique: true
|   ->  Bitmap Heap Scan on public.user_access_pairs  (cost=4.31..11.26 rows=2 width=4) (actual time=0.029..0.036 rows=2 loops=1)
|         Output: user_access_pairs.id1, user_access_pairs.id2
|         Filter: ((current_setting('jwt.claims.user_id'::text, true))::integer = user_access_pairs.id1)
|         Heap Blocks: exact=2
|         ->  Bitmap Index Scan on index_user_access_pairs  (cost=0.00..4.31 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1)
|               Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)
|   ->  Index Scan using users_pkey on public.users  (cost=0.28..8.30 rows=1 width=18) (actual time=0.008..0.008 rows=1 loops=2)
|         Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask, users.reset_password_token, users.reset_password_sent_at, users.remember_created_at, users.sign_in_count, users.current_sign_in_at, users.last_sign_in_at,
|         Index Cond: (users.id = user_access_pairs.id2)
| Planning Time: 0.526 ms
| Execution Time: 0.116 ms
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

为什么 RLS 在进行查询时不使用索引?

PS 我使用的是 PostgreSQL 12.4 版

database> select version()
+-------------------------------------------------------------------------------------------------------------------------------+
| version                                                                                                                       |
|-------------------------------------------------------------------------------------------------------------------------------|
| PostgreSQL 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, 64-bit |
+-------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

编辑

感谢劳伦斯的回应。它大大提高了性能。但我仍在进行一些 seq 扫描。

这是 Laurenz 建议的更新政策。

create policy select_users_policy
  on public.users
  for select using (
    exists (
      select 1
      from user_access_pairs
      where
        id1 = current_setting('jwt.claims.user_id'::text, true)::integer
        and id2 = users.id
    )
  );
Run Code Online (Sandbox Code Playgroud)

users即使exists策略中的查询使用了索引,使用 RLS 查询此表仍会在表上进行 seq 扫描。

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
  select first_name, last_name
  from users
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                            |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users  (cost=0.00..40048.81 rows=2394 width=14) (actual time=0.637..1.216 rows=2 loops=1)                                          |
|   Output: users.first_name, users.last_name                                                                                                           |
|   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)                                                                                               |
|   Rows Removed by Filter: 4785                                                                                                                        |
|   SubPlan 1                                                                                                                                           |
|     ->  Index Only Scan using index_user_access_pairs on public.user_access_pairs  (cost=0.29..8.31 rows=1 width=0) (never executed)                  |
|           Index Cond: ((user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) AND (user_access_pairs.id2 = users.id)) |
|           Heap Fetches: 0                                                                                                                             |
|   SubPlan 2                                                                                                                                           |
|     ->  Bitmap Heap Scan on public.user_access_pairs user_access_pairs_1  (cost=4.31..11.26 rows=2 width=4) (actual time=0.075..0.083 rows=2 loops=1) |
|           Output: user_access_pairs_1.id2                                                                                                             |
|           Recheck Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                      |
|           Heap Blocks: exact=2                                                                                                                        |
|           ->  Bitmap Index Scan on index_user_access_pairs_on_id1  (cost=0.00..4.31 rows=2 width=0) (actual time=0.064..0.064 rows=2 loops=1)         |
|                 Index Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                  |
| Planning Time: 0.572 ms                                                                                                                               |
| Execution Time: 1.295 ms                                                                                                                              |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

这是在没有 RLS 的情况下“手动”完成的相同查询以进行比较。这次没有 seq 扫描,性能明显更好(尤其是在更大的数据集上运行时)

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
    select first_name, last_name
    from users
    where exists (
       select 1
       from user_access_pairs
       where
         id1 = current_setting('jwt.claims.user_id'::text, true)::integer
         and id2 = users.id
     )

+---------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                  |
|---------------------------------------------------------------------------------------------------------------------------------------------|
| Nested Loop  (cost=4.59..27.86 rows=2 width=14) (actual time=0.020..0.033 rows=2 loops=1)                                                   |
|   Output: users.first_name, users.last_name                                                                                                 |
|   Inner Unique: true                                                                                                                        |
|   ->  Bitmap Heap Scan on public.user_access_pairs  (cost=4.31..11.26 rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1)             |
|         Output: user_access_pairs.id1, user_access_pairs.id2                                                                                |
|         Recheck Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                |
|         Heap Blocks: exact=2                                                                                                                |
|         ->  Bitmap Index Scan on index_user_access_pairs_on_id1  (cost=0.00..4.31 rows=2 width=0) (actual time=0.010..0.010 rows=2 loops=1) |
|               Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                            |
|   ->  Index Scan using users_pkey on public.users  (cost=0.28..8.30 rows=1 width=18) (actual time=0.006..0.006 rows=1 loops=2)              |
|         Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask                        |
|         Index Cond: (users.id = user_access_pairs.id2)                                                                                      |
| Planning Time: 0.464 ms                                                                                                                     |
| Execution Time: 0.075 ms                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

我会猜到查询计划器会将这两个查询视为相同的。为什么它们不同,可以做些什么来避免 seq 扫描?

小智 6

The reason why you are not seeing the same plan as the seemingly equivalent query without the RLS policy is that subquery pullup is happening before RLS policies are taken into account. This is a planner quirk.

To summarize, RLS policies in combination with subqueries are unfortunately not each other friends performance-wise.

For your information, a similar manifestation can be seen when comparing the following two queries:

SELECT ... FROM my_table WHERE                     EXISTS(SELECT ...);
SELECT ... FROM my_table WHERE CASE WHEN true THEN EXISTS(SELECT ...) END;
Run Code Online (Sandbox Code Playgroud)

Here, while both queries are equivalent, the second query results in a (hashed) subplan for the subquery, because the folding of the unnecessary CASE WHEN true is done after subquery pullup.

Disclaimer: I got this information from RhodiumToad on IRC #postgresql, but explained/simplified it in my own words.