Nec*_*oft 2 postgresql performance view many-to-many postgresql-10
我有一个表,用于表上的多对多关系users
来表示用户之间的跟随关系:
CREATE TABLE users (
id text PRIMARY KEY,
username text NOT NULL
);
CREATE TABLE followers (
userid text,
followid text,
PRIMARY KEY (userid, followid),
CONSTRAINT followers_userid_fk FOREIGN KEY (userid) REFERENCES users (id),
CONSTRAINT followers_followid_fk FOREIGN KEY (followid) REFERENCES users (id)
);
CREATE INDEX followers_followid_idx ON followers (followid);
Run Code Online (Sandbox Code Playgroud)
当我想使用与用户相关的数据创建 JSON 响应时,我有两种情况:
用户数据对象应包含两个用户 ID 数组,一个是他们关注的用户,另一个是关注他们的用户。为了创建这两个字段,我使用了以下SELECT
语句。
DECLARE follows RECORD;
SELECT array (select followid FROM followers where userid = Puserid) AS following,
array (select userid FROM followers where followid = Puserid) AS followers
INTO follows;
Run Code Online (Sandbox Code Playgroud)
当请求是针对用户列表时,我想为 JSON 用户列表中返回的每个用户对象创建这两个字段。
我选择将跟随关系实现为多对多表,这样我就不必从用户(或用户配置文件)表中包含的数组中搜索和删除 id,而且将来我可能会添加有关跟随的元数据关系(可能是通知设置或阻止用户等。
但是,我开始怀疑这个决定的效率,特别是在为 200 个用户发出许多请求的情况下,我认为这些请求会为SELECT
列表中的每个 hte id运行上述查询。这会非常低效吗?
我确实在两列上都有索引(因为主键索引对搜索没有用followid
),但我正在考虑创建一个包含followid
列的 array_agg 的视图:
SELECT userid, array_agg(followid) as following
FROM followers
GROUP BY userid;
Run Code Online (Sandbox Code Playgroud)
但是要了解关注者和关注者,我需要以下内容:
SELECT f1.userid, array_agg(f1.followid) as following,
f2.followers FROM followers AS f1 INNER JOIN
(select followid AS userid, array_agg(userid) as followers
from ks.followers
group by followid) AS f2 ON f1.userid = f2.userid group by f1.userid, f2.followers;
Run Code Online (Sandbox Code Playgroud)
这不是个好主意,对吧?
我是否采用了错误的方法来建模用户之间的这种关系?
我为此做了两次尝试,对于 18 个 id 的简短列表,这两次尝试都需要大约 600 毫秒:
CREATE OR REPLACE VIEW follow_following AS
select f1.userid, array_agg(f1.followid) as following,
f2.followers FROM followers AS f1 INNER JOIN
(select followid AS userid, array_agg(userid) as followers
from followers
group by followid) AS f2 ON f1.userid = f2.userid group by f1.userid, f2.followers;
CREATE OR REPLACE FUNCTION get_users_by_ids(Puserids TEXT[])
RETURNS JSON AS $$
DECLARE rjson JSON;
BEGIN
CREATE TEMP TABLE getusers ON COMMIT DROP AS
SELECT u.id, u.username, p.bio, p.avatar, f.followers, f.following
FROM users u
INNER JOIN profiles p
ON u.id = p.userid
LEFT OUTER JOIN follow_following f
ON u.id = f.userid
WHERE u.id = ANY(Puserids);
SELECT INTO rjson json_agg (
json_build_object (
'data',json_build_object (
'id',getusers.id,
'username',getusers.username,
'bio',getusers.bio,
'avatar',getusers.avatar,
'following', getusers.following,
'followers', getusers.followers
)
)
) FROM getusers;
return rjson;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER
Run Code Online (Sandbox Code Playgroud)
当我循环数组而不是使用视图时,此函数在 17 个 ID 的数组上运行大致相同(650 毫秒):
CREATE OR REPLACE FUNCTION get_users_by_ids(Puserids TEXT[])
RETURNS JSON AS $$
DECLARE
rjson JSON;
uid TEXT;
BEGIN
CREATE TEMP TABLE getusers (
userid text,
username text,
following text[],
followers text[]
) ON COMMIT DROP;
FOREACH uid IN ARRAY Puserids
LOOP
INSERT INTO getusers (userid, username, followers, following)
SELECT u.id, u.username,
array (select userid FROM followers where followid = uid) AS followers,
array (select followid FROM followers where userid = uid) AS following
FROM ks.users u
WHERE u.id = uid;
END LOOP;
SELECT INTO rjson json_agg (
json_build_object (
'id',getusers.userid,
'username',getusers.username,
'following', getusers.following,
'followers', getusers.followers
)
) FROM getusers;
return json_build_object ('data', rjson);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER
Run Code Online (Sandbox Code Playgroud)
为此目的,创建临时表和循环是昂贵的矫枉过正。首先,您甚至不需要 plpgsql - 尽管在同一会话中重复调用它可能会稍微快一些。从根本上简化:
CREATE OR REPLACE FUNCTION get_users_by_ids(_uids text[])
RETURNS JSON
LANGUAGE sql SECURITY DEFINER AS
$func$
SELECT json_agg(sub)
FROM (
SELECT u.id, u.username
, ARRAY (SELECT followid FROM followers WHERE userid = u.id) AS following
, ARRAY (SELECT userid FROM followers WHERE followid = u.id) AS followers
FROM users u
WHERE u.id = ANY (_uids)
) sub
$func$;
Run Code Online (Sandbox Code Playgroud)
我json_agg()
在子查询上使用 a而不是json_build_object()
. 应该快一点,但。有关的:
如果您需要,它可以方便地允许对数组元素进行廉价排序:ORDER BY
在子查询中添加。您可能希望保留元素的原始顺序。看:
如果您需要SECURITY DEFINER
(您真的需要吗?),请确保它不会被滥用。请参阅此 Postgres Wiki 页面:
相关子查询在这里应该是最快的;如果没有找到following
,followers
则为NULL 。或者,LATERAL
连接可能会起作用。有关的:
如果您需要将所有内容嵌套在“数据”键中,您可以轻松添加,但这似乎只是噪音。
一个VARIADIC
参数_uids
可能很方便:
(但列表输入只允许最多 100 个参数。您仍然可以传递任意长度的数组。)
如果只允许索引扫描使二级指标followers_followid_idx
上(followid, userid)
,而不是只(followid)
。有关的:
标准化设计是个好主意。它有助于提高写入速度,并在处理后续工作时防止大量的表膨胀和锁定争用。它在许多其他方面都更胜一筹。
不过,我强烈建议使用integer
ID。更小,更快。索引的最佳大小。有关的:
您始终可以额外输出文本 ID 。
归档时间: |
|
查看次数: |
1120 次 |
最近记录: |