对于特定的全文搜索,我需要修改标准停用词文件并排除一些单词。到目前为止我做了什么:
复制german.stop到german_modified.stop,然后从 中删除文字german_modified.stop。然后:
CREATE TEXT SEARCH DICTIONARY public.german_nostop (
TEMPLATE = pg_catalog.simple,
STOPWORDS = german_modified
);
CREATE TEXT SEARCH CONFIGURATION public.german_nostop (
COPY = pg_catalog.german
);
ALTER TEXT SEARCH CONFIGURATION public.german_nostop
ALTER MAPPING
FOR asciiword, asciihword, hword_asciipart, hword, hword_part, word
WITH german_nostop;
CREATE INDEX body_idx ON comments
USING gin (to_tsvector('german_nostop', body));
Run Code Online (Sandbox Code Playgroud)
但当我这样做时
SELECT body, autor
FROM comments
WHERE to_tsvector('german_nostop', body) @@ to_tsquery('wie');
Run Code Online (Sandbox Code Playgroud)
我得到:
NOTICE: text-search query contains only stop words or doesn't contain lexemes, …Run Code Online (Sandbox Code Playgroud) 我有以下工作查询:
select children.autor as child, parents.autor as parent, count(*) from comments children
left join comments parents on (children.parentid = parents.commentid)
group by child, parent
order by count(*) desc
limit 4;
Run Code Online (Sandbox Code Playgroud)
产生以下输出:
child | parent | count
peter | max | 154
alex | peter | 122
peter | kARL | 82
stephen | alex | 50
Run Code Online (Sandbox Code Playgroud)
现在评论表还有一个“正文”列,它是实际评论,我想在每对子级和父级的选择中包含最后一条评论。
所以在第一行,我想要彼得在回复 max 时写的最后一条评论。到目前为止,我什至不知道如何处理这个问题。子查询?某种窗口函数?
如果我使用 (max)bodytext,它几乎完全符合我的要求。只是最长的评论并不是我真正想要的。