有效地从 am:n 表中返回两个聚合数组

Nec*_*oft 2 postgresql performance view many-to-many postgresql-10

我有一个表,用于表上的多对多关系users来表示用户之间的跟随关系:

CREATE TABLE users (
    id text PRIMARY KEY,
    username text NOT NULL
);

CREATE TABLE followers (
    userid text,
    followid text,
    PRIMARY KEY (userid, followid),
    CONSTRAINT followers_userid_fk   FOREIGN KEY (userid)   REFERENCES users (id),
    CONSTRAINT followers_followid_fk FOREIGN KEY (followid) REFERENCES users (id)
);

CREATE INDEX followers_followid_idx ON followers (followid);
Run Code Online (Sandbox Code Playgroud)

当我想使用与用户相关的数据创建 JSON 响应时,我有两种情况:

  • 通过 id 请求单个用户,
  • 通过 id 列表请求用户对象数组

用户数据对象应包含两个用户 ID 数组,一个是他们关注的用户,另一个是关注他们的用户。为了创建这两个字段,我使用了以下SELECT语句。

DECLARE follows RECORD;
SELECT  array (select followid FROM followers where userid = Puserid) AS following, 
    array (select userid FROM followers where followid = Puserid) AS followers 
INTO follows;
Run Code Online (Sandbox Code Playgroud)

当请求是针对用户列表时,我想为 JSON 用户列表中返回的每个用户对象创建这两个字段。

我选择将跟随关系实现为多对多表,这样我就不必从用户(或用户配置文件)表中包含的数组中搜索和删除 id,而且将来我可能会添加有关跟随的元数据关系(可能是通知设置或阻止用户等。

但是,我开始怀疑这个决定的效率,特别是在为 200 个用户发出许多请求的情况下,我认为这些请求会为SELECT列表中的每个 hte id运行上述查询。这会非常低效吗?

我确实在两列上都有索引(因为主键索引对搜索没有用followid),但我正在考虑创建一个包含followid列的 array_agg 的视图:

SELECT userid, array_agg(followid) as following
FROM followers
GROUP BY userid;
Run Code Online (Sandbox Code Playgroud)

但是要了解关注者和关注者,我需要以下内容:

SELECT f1.userid, array_agg(f1.followid) as following, 
    f2.followers FROM followers AS f1 INNER JOIN
    (select followid AS userid, array_agg(userid) as followers
from ks.followers
    group by followid) AS f2 ON f1.userid = f2.userid group by f1.userid, f2.followers;
Run Code Online (Sandbox Code Playgroud)

这不是个好主意,对吧?

我是否采用了错误的方法来建模用户之间的这种关系?

我为此做了两次尝试,对于 18 个 id 的简短列表,这两次尝试都需要大约 600 毫秒:

尝试 1

CREATE OR REPLACE VIEW follow_following AS
    select f1.userid, array_agg(f1.followid) as following, 
    f2.followers FROM followers AS f1 INNER JOIN
    (select followid AS userid, array_agg(userid) as followers
    from followers
    group by followid) AS f2 ON f1.userid = f2.userid group by f1.userid, f2.followers;

CREATE OR REPLACE FUNCTION get_users_by_ids(Puserids TEXT[])
    RETURNS JSON AS $$
    DECLARE rjson JSON;
    BEGIN
        CREATE TEMP TABLE getusers ON COMMIT DROP AS
        SELECT u.id, u.username, p.bio, p.avatar, f.followers, f.following
        FROM users u
        INNER JOIN profiles p
        ON u.id = p.userid
        LEFT OUTER JOIN follow_following f 
        ON u.id = f.userid
        WHERE u.id = ANY(Puserids);

        SELECT INTO rjson json_agg (
            json_build_object (
                'data',json_build_object (
                    'id',getusers.id,
                    'username',getusers.username,
                    'bio',getusers.bio,
                    'avatar',getusers.avatar,
                    'following', getusers.following,
                    'followers', getusers.followers
                )
            )
        ) FROM getusers;
        return rjson;
    END;
$$ LANGUAGE plpgsql SECURITY DEFINER
Run Code Online (Sandbox Code Playgroud)

尝试 2

当我循环数组而不是使用视图时,此函数在 17 个 ID 的数组上运行大致相同(650 毫秒):

CREATE OR REPLACE FUNCTION get_users_by_ids(Puserids TEXT[])
RETURNS JSON AS $$
DECLARE 
    rjson JSON;
    uid   TEXT;
BEGIN
    CREATE TEMP TABLE getusers (
        userid text,
        username text,
        following text[],
        followers text[]
    ) ON COMMIT DROP;

    FOREACH uid IN ARRAY Puserids
    LOOP
        INSERT INTO getusers (userid, username, followers, following)
        SELECT u.id, u.username,
            array (select userid FROM followers where followid = uid) AS followers,
            array (select followid FROM followers where userid = uid) AS following
        FROM ks.users u
        WHERE u.id = uid;
    END LOOP;

    SELECT INTO rjson json_agg (
        json_build_object (
            'id',getusers.userid,
            'username',getusers.username,
            'following', getusers.following,
            'followers', getusers.followers
        )
    ) FROM getusers;

    return json_build_object ('data', rjson);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 5

询问

为此目的,创建临时表和循环是昂贵的矫枉过正。首先,您甚至不需要 plpgsql - 尽管在同一会话中重复调用它可能会稍微快一些。从根本上简化:

CREATE OR REPLACE FUNCTION get_users_by_ids(_uids text[])
  RETURNS JSON
  LANGUAGE sql SECURITY DEFINER AS
$func$
   SELECT json_agg(sub)
   FROM (
      SELECT u.id, u.username
           , ARRAY (SELECT followid FROM followers WHERE userid   = u.id) AS following
           , ARRAY (SELECT userid   FROM followers WHERE followid = u.id) AS followers
      FROM   users u
      WHERE  u.id = ANY (_uids)
      ) sub
$func$;
Run Code Online (Sandbox Code Playgroud)

json_agg()在子查询上使用 a而不是json_build_object(). 应该快一点,但。有关的:

如果您需要,它可以方便地允许对数组元素进行廉价排序:ORDER BY在子查询中添加。您可能希望保留元素的原始顺序。看:

如果您需要SECURITY DEFINER(您真的需要吗?),请确保它不会被滥用。请参阅此 Postgres Wiki 页面:

CVE-2018-1058 指南:保护您的搜索路径

相关子查询在这里应该是最快的;如果没有找到followingfollowers则为NULL 。或者,LATERAL连接可能会起作用。有关的:

如果您需要将所有内容嵌套在“数据”键中,您可以轻松添加,但这似乎只是噪音。

一个VARIADIC参数_uids可能很方便:

(但列表输入只允许最多 100 个参数。您仍然可以传递任意长度的数组。)

指数

如果只允许索引扫描使二级指标followers_followid_idx(followid, userid),而不是只(followid)。有关的:

数据库设计

标准化设计是个主意。它有助于提高写入速度,并在处理后续工作时防止大量的表膨胀和锁定争用。它在许多其他方面都更胜一筹。

不过,我强烈建议使用integerID。更小,更快。索引的最佳大小。有关的:

您始终可以额外输出文本 ID 。