MySQL:SELECT INTO使用的字符编码?

Dav*_*ver 23 mysql utf-8 character-encoding

我正在尝试从MySQL数据库中导出一些数据,但是该表中的unicode发生了奇怪而美妙的事情.

我将专注于一个角色,左侧智能引用:"

当我SELECT从控制台使用时,打印时没有问题:

mysql> SELECT text FROM posts;
+-------+
| text  |
+-------+
| “foo” |
+-------+
Run Code Online (Sandbox Code Playgroud)

这意味着数据以utf-8 [0]的形式发送到我的终端(这是正确的).

但是,当我使用时SELECT * FROM posts INTO OUTFILE '/tmp/x.csv' …;,输出文件正确编码:

$ cat /tmp/x.csv
“fooâ€
Run Code Online (Sandbox Code Playgroud)

具体来说,它用七(7!)个字节编码:\xc3\xa2\xe2\x82\xac\xc5\x93.

这是什么编码?或者,我怎么能告诉MySQL使用一个不太合理的编码?

还有一些杂项事实:

  • SELECT @@character_set_database 回报 latin1
  • text列是VARCHAR(42):
    mysql> DESCRIBE posts;
    +-------+-------------+------+-----+---------+-------+
    | Field | Type        | Null | Key | Default | Extra |
    +-------+-------------+------+-----+---------+-------+
    | text  | varchar(42) | NO   | MUL |         |       |
    +-------+-------------+------+-----+---------+-------+
    
  • 编码为utf-8产量 \xe2\x80\x9c
  • \xe2\x80\x9c解码latin1然后重新编码为utf-8yield \xc3\xa2\xc2\x80\xc2\x9c(6字节).
  • 另一个数据点:(utf-8 :) \xe2\x80\xa6编码为\xc3\xa2\xe2\x82\xac\xc2\xa6

[0]:因为智能引号不包含在任何8位编码中,并且我的终端正确呈现utf-8字符.

小智 24

较新版本的MySQL可以选择在outfile子句中设置字符集:

SELECT col1,col2,col3 
FROM table1 
INTO OUTFILE '/tmp/out.txt' 
CHARACTER SET utf8
FIELDS TERMINATED BY ','
Run Code Online (Sandbox Code Playgroud)


taa*_*avi 6

许多程序/标准(包括MySQL)假设"latin1"表示"cp1252",因此0x80字节被解释为欧元符号,这是该\xe2\x82\xac位(U + 20AC)来自中间的位置.

当我尝试这个时,它正常工作(但请注意我如何放入数据,并在db服务器上设置变量):

mysql> set names utf8; -- http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
mysql> create table sq (c varchar(10)) character set utf8;
mysql> show create table sq\G
*************************** 1. row ***************************
       Table: sq
Create Table: CREATE TABLE `sq` (
  `c` varchar(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.19 sec)

mysql> insert into sq values (unhex('E2809C'));
Query OK, 1 row affected (0.00 sec)

mysql> select hex(c), c from sq;
+--------+------+
| hex(c) | c    |
+--------+------+
| E2809C | “  |
+--------+------+
1 row in set (0.00 sec)

mysql> select * from sq into outfile '/tmp/x.csv';
Query OK, 1 row affected (0.02 sec)

mysql> show variables like "%char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

从shell:

/tmp$ hexdump -C x.csv
00000000  e2 80 9c 0a                                       |....|
00000004
Run Code Online (Sandbox Code Playgroud)

希望那里有一个有用的花絮......