d3v*_*kit 7 postgresql full-text-search
我有一个电影表,我想搜索标题并返回最接近的匹配项。
我认为全文搜索可能有用,但它似乎无法按单词的位置排序,尽管 postgres 知道该位置。这在 postgres 中可能吗?
这是我的查询:
SELECT collectibles.id, collectibles.title, ts_rank_cd(to_tsvector('english', collectibles.title), plainto_tsquery('old school')) AS score
FROM collectibles WHERE to_tsvector('english', collectibles.title) @@ plainto_tsquery('old school')
ORDER BY score DESC;
Run Code Online (Sandbox Code Playgroud)
以下是一些结果:(这是我能想到的最好的格式,抱歉!)
id | title | score
- 277568 | Wilson Meadows: Live At The 15th Old School & Blues Festival | 0.1
- 3545 | 5 Film Collection: Will Ferrell: Campaign / Old School (Unrtated Version) / Blades Of Glory / Roxbury / Semi-Pro | 0.1
- 10366 | Alice Cooper: Old School: 1964-1974 (DVD/CD Combo) | 0.1
- 13004 | American Classics: Old School (3-Disc Set) | 0.1
- 13005 | American Classics: Old School: Classic Chevrolets | 0.1
- 13006 | American Classics: Old School: Classic Travel Trailers | 0.1
- 13007 | American Classics: Old School: Kings Of Kustomizing | 0.1
- 14592 | Anchorman: The Legend Of Ron Burgundy (Widescreen/ Extended Edition) / Old School (R-Rated Version) (Back-To-Back) | 0.1
- 14593 | Anchorman: The Legend Of Ron Burgundy (Widescreen/ Extended Edition) / Old School (R-Rated Version) (Side-By-Side) | 0.1
- 20242 | Audiovisualize: Mixed By Addictive TV: Snake Worship Island / Corp. Inc. / Old School Futures / These Melodies / Robot War / ... | 0.1
- 192057 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition) | 0.1
- 192058 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition) / Road Trip (R-Rated) (Back-To-Back) | 0.1
- 192059 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition) / Road Trip (R-Rated) (Side-By-Side) | 0.1
- 192060 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition) / Road Trip (Unrated) (Back-To-Back) | 0.1
- 192061 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition) / Road Trip (Unrated) (Side-By-Side) | 0.1
- 192062 | Old School (Warner Brothers/ R-Rated Version) | 0.1
- 192063 | Old School (Warner Brothers/ R-Rated Version/ Blu-ray) | 0.1
- 192064 | Old School (Warner Brothers/ Unrated Version) | 0.1
- 192065 | Old School (Warner Brothers/ Unrated Version/ Blu-ray) | 0.1
- 192066 | Old School Comedy (4-Pack): Atoll K / Jack And The Beanstalk / The Flying Deuces / Africa Screams | 0.1
- 192067 | Old School Hip Hop Dance #1: Beginner | 0.1
- 192068 | Old School Hip Hop Greatest | 0.1
- 192069 | Old School Hip Hop: Run DMC & Flava Flav (2-Disc) | 0.1
- 192070 | Old School Hits Movie Marathon Collection (3-Disc) | 0.1
- 192071 | Old School Returns | 0.1
Run Code Online (Sandbox Code Playgroud)
所有这些的得分都是 0.1,但许多标题中的单词位置更接近字符串的前面。有什么办法可以将它们排名更高吗?不幸的是,字符串或 id 的长度并不是真正好的排名限定符。
这里需要对ts_rank(tsvector,tsquery,normalization factor)函数使用归一化。在下面的代码片段中,我使用了normalization= 1(将排名除以 1 + 文档长度的对数),但您可以将其调整为您真正需要的。这是示例:
WITH s(id,tsv) AS ( VALUES
(1,to_tsvector('english','Alice Cooper: Old School: 1964-1974 (DVD/CD Combo)')),
(2,to_tsvector('english','American Classics: Old School: Kings Of Kustomizing')),
(3,to_tsvector('english','Old School Hip Hop Greatest')),
(4,to_tsvector('english','Old School Returns'))
)
SELECT id,ts_rank(tsv,tsq,1) AS rank
FROM s,to_tsquery('english','old & school') tsq
ORDER BY rank DESC;
Run Code Online (Sandbox Code Playgroud)
结果:
id | rank
----+-----------
4 | 0.0495516
3 | 0.0383384
2 | 0.0353013
1 | 0.0312636
(4 rows)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3980 次 |
| 最近记录: |