use*_*466 10 mysql unicode select match
我正在运行MySQL 5.1.50并且有一个如下所示的表:
organizations | CREATE TABLE `organizations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`url` text CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`phone` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=25837 DEFAULT CHARSET=utf8 |
Run Code Online (Sandbox Code Playgroud)
我遇到的问题是MySQL将unicode字符与ascii版本匹配.例如,当我搜索包含"é"的单词时,它将匹配具有"e"的相同单词,反之亦然:
mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT id, name FROM `organizations` WHERE `name` = 'Universite de Montreal';
+-------+-------------------------+
| id | name |
+-------+-------------------------+
| 16973 | Université de Montreal |
+-------+-------------------------+
1 row in set (0.01 sec)
Run Code Online (Sandbox Code Playgroud)
我从PHP和命令行控制台获得了这些结果.如何从SELECT查询中获得准确的匹配?
谢谢!
小智 12
您指定了name
列text CHARACTER SET utf8 COLLATE utf8_unicode_ci
,告诉MySQL 在匹配和排序时将e和é视为等效.整理并且utf8_general_ci
两者都有很多相同的东西.
http://www.collation-charts.org/是一个很好的资源,一旦你学会了如何阅读图表,这是非常容易的.
如果您希望e和é等被认为不同,那么您必须选择不同的排序规则.要找出服务器上的排序规则(假设您仅限于UTF-8编码):
mysql> show collation like 'utf8%';
Run Code Online (Sandbox Code Playgroud)
并选择使用整理图表作为参考.
另一个特殊的校对是utf8_bin
没有等价的,它是二元匹配.
我所知道的唯一不是语言特定的MySQL Unicode排序规则是utf8_unicode_ci
,utf8_general_ci
和utf8_bin
.他们很奇怪.整理的真正目的是使计算机与某个人所期望的人匹配和排序.匈牙利语和土耳其语词典根据不同的规则排序.指定排序规则允许您根据此类本地规则进行排序和匹配.
例如,似乎丹麦人认为e和é等价,但冰岛人不这样做:
mysql> select _utf8'e' collate utf8_danish_ci
-> = _utf8'é' collate utf8_danish_ci as equal;
+-------+
| equal |
+-------+
| 1 |
+-------+
mysql> select _utf8'e' collate utf8_icelandic_ci
-> = _utf8'é' collate utf8_icelandic_ci as equal;
+-------+
| equal |
+-------+
| 0 |
+-------+
Run Code Online (Sandbox Code Playgroud)
另一个方便的技巧是用一堆你感兴趣的字符填充一个列表(从脚本中更容易)然后MySQL可以告诉你等价:
mysql> create table t (c char(1) character set utf8);
mysql> insert into t values ('a'), ('ä'), ('á');
mysql> select group_concat(c) from t group by c collate utf8_icelandic_ci;
+-----------------+
| group_concat(c) |
+-----------------+
| a |
| á |
| ä |
+-----------------+
mysql> select group_concat(c) from t group by c collate utf8_danish_ci;
+-----------------+
| group_concat(c) |
+-----------------+
| a,á |
| ä |
+-----------------+
mysql> select group_concat(c) from t group by c collate utf8_general_ci;
+-----------------+
| group_concat(c) |
+-----------------+
| a,ä,á |
+-----------------+
Run Code Online (Sandbox Code Playgroud)
小智 5
当然,这将有效:
SELECT * FROM table WHERE name LIKE BINARY 'namé';
Run Code Online (Sandbox Code Playgroud)