xen*_*ndi 2 python mysql character-encoding python-3.x
我在堆栈上找到了有关“错误字符串值”的其他问题/答案,但没有一个答案有效,所以也许我的情况有所不同。
try:
self.cnx = mysql.connector.connect(host='localhost', user='emails', password='***',
database='extractor', raise_on_warnings=True)
except mysql.connector.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your user name or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
else:
print(err)
self.sql = self.cnx.cursor()
biography = str(row[8])
self.sql.execute("""insert into emails (biography)
values(%s)""",
(biography,))
Run Code Online (Sandbox Code Playgroud)
其中TEXTbiography列为:utf8mb4_general_ci
< Living the > Azofra & Clifford Travel Food Fashion
我得到:
mysql.connector.errors.DataError: 1366 (22007): Incorrect string value: '\xF0\x9F\x85\x97\xF0\x9F...' for column `extractor`.`emails`.`biography` at row 1
Run Code Online (Sandbox Code Playgroud)
输出show create table emails:
show create table emails;

| Table | Create Table |

| emails | CREATE TABLE `emails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ufeff_user_id` varchar(20) COLLATE utf8_bin DEFAULT NULL,
`username` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`full_name` varchar(100) COLLATE utf8_bin DEFAULT NULL,
`is_private` tinyint(1) DEFAULT NULL,
`follower_count` int(10) DEFAULT NULL,
`following_count` int(10) DEFAULT NULL,
`media_count` int(10) DEFAULT NULL,
`biography` text CHARACTER SET utf8mb4 DEFAULT NULL,
`has_profile_pic` tinyint(1) DEFAULT NULL,
`external_url` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`public_email` varchar(320) CHARACTER SET utf8 NOT NULL,
`contact_phone_number` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`address_street` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`is_business` tinyint(1) DEFAULT NULL,
`engagement` int(10) DEFAULT NULL,
`recent_post_date` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`category` varchar(75) COLLATE utf8_bin DEFAULT NULL,
`avg_likes` int(10) DEFAULT NULL,
`avg_comments` int(10) DEFAULT NULL,
`business_join_date` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`business_count` int(5) DEFAULT NULL,
`business_ads` tinyint(1) DEFAULT NULL,
`country_code` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`emailscol` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`city_name` varchar(75) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`public_email`)
) ENGINE=InnoDB AUTO_INCREMENT=139 DEFAULT CHARSET=utf8 COLLATE=utf8_bin |

Run Code Online (Sandbox Code Playgroud)
我假设该列的emails.biography类型为VARCHAR,表CHARSET的类型为。如果没有,您需要执行: emailsutf8mb4
ALTER TABLE `emails` CONVERT TO CHARACTER SET utf8mb4;
Run Code Online (Sandbox Code Playgroud)
然后,如果这不能解决问题,请尝试在 Python 中创建 MySQL 游标后直接执行以下命令(假设self.sql是您的游标):
self.sql.execute('SET NAMES utf8mb4;')
self.sql.execute('SET CHARACTER SET utf8mb4;')
self.sql.execute('SET character_set_connection=utf8mb4;')
Run Code Online (Sandbox Code Playgroud)
如果这不起作用,请尝试在 Python 中创建 MySQL 连接后立即设置字符集,例如:
self.connection.set_character_set('utf8mb4')
Run Code Online (Sandbox Code Playgroud)
如果此时您仍然不走运,我们可以进一步调试:)
更新:
尝试:
ALTER TABLE `emails` CONVERT TO CHARACTER SET utf8;
ALTER TABLE `emails` CHANGE COLUMN `biography` TEXT CHARACTER SET 'utf8';
Run Code Online (Sandbox Code Playgroud)
请注意,这utf8mb4_general_ci是表的排序规则,而不是编码。理想情况下,您应该使用COLLATE utf8_unicode_ci.