PostgreSQL 全文搜索西班牙语字符 Ñ

Question

PostgreSQL 全文搜索西班牙语字符 Ñ

我在使用 PostgreSQL 对包含西班牙字符“Ñ”的文本进行全文搜索时遇到问题

当我尝试对西班牙语单词“AÑO”（年）进行标记时，根据输入是大写还是小写，我得到以下结果：

SELECT to_tsvector('spanish','AÑO'),to_tsquery('spanish','año')
"to_tsvector"   "to_tsquery"
"'aÑo':1"   "'año'"

Run Code Online (Sandbox Code Playgroud)

如您所见，结果不一样并且区分大小写，因此如果我的应用程序全文搜索查询包含此字符，则它们区分大小写。

有没有办法克服这个问题？我一直在搜索有关全文搜索的 PostgreSQL 文档，但我不知道如何在已安装的词典上更改此行为。

非常感谢。马蒂

Answer 1

Dan*_*ité 5

to_tsvector转换Ñ为的能力ñ取决于语言环境，特别是lc_ctype. 大概您的数据库正在使用LC_CTYPE诸如C其知识仅限于US-ASCII.

LC_CTYPE与 Unicode 兼容的示例：

测试=> 显示 lc_ctype；
  lc_ctype   
-------------
 fr_FR.UTF-8
(1 行)

test=> SELECT to_tsvector('spanish','AÑO'),to_tsquery('spanish','año');
 to_tsvector | to_tsquery
-------------+------------
 'año':1 | 'año'
(1 行)

请注意，缩小是您所期望的。

相反的例子C：

创建：

CREATE DATABASE cc lc_ctype 'C' template template0;

Run Code Online (Sandbox Code Playgroud)

请注意缺少缩小，如问题所示：

cc=> 显示 lc_ctype ；
 lc_ctype 
----------
 C
(1 行)

cc=> SELECT to_tsvector('spanish','AÑO'),to_tsquery('spanish','año');
 to_tsvector | to_tsquery
-------------+------------
 'aÑo':1 | 'año'
(1 行)

归档时间：	8 年，9 月前
查看次数：	1569 次
最近记录：	8 年，9 月前

PostgreSQL 全文搜索 西班牙语字符 Ñ

PostgreSQL 全文搜索西班牙语字符 Ñ