概述:
utf-8编码可能2个字节、3个字节、4个字节的字符,但是MySQL的utf8编码只支持3字节的数据,而移动端的表情数据和生僻字是4个字节的字符。如果直接往采用utf-8编码的数据库中插入表情数据,python程序中将报SQL异常:Traceback (most recent call last):
File "/Users/chenxin/Dropbox/python/django/shici/clean_data/export_data.py", line 101, in
export_dynasty("魏晋")
File "/Users/chenxin/Dropbox/python/django/shici/clean_data/export_data.py", line 60, in export_dynasty
cursor.execute(sql, tuple(temp_dict.values()))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/cursors.py", line 163, in execute
result = self._query(query)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/cursors.py", line 321, in _query
conn.query(q)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/connections.py", line 505, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/connections.py", line 724, in _read_query_result
result.read()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/connections.py", line 1069, in read
first_packet = self.connection._read_packet()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/connections.py", line 676, in _read_packet
packet.raise_for_error()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/protocol.py", line 223, in raise_for_error
err.raise_mysql_exception(self._data)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.DataError: (1366, "Incorrect string value: '\\xF0\\xA4\\x83\\x83' for column 'name' at row 1")
可以对4字节的字符进行编码存储,然后取出来的时候,再进行解码。但是这样做会使得任何使用该字符的地方都要进行编码与解码。
utf8mb4编码是utf8编码的超集,兼容utf8,并且能存储4字节的表情字符。
采用utf8mb4编码的好处是:存储与获取数据的时候,不用再考虑表情字符的编码与解码问题。
更改数据库的编码为utf8mb4:
MySQL的版本
utf8mb4的最低mysql版本支持版本为5.5.3+,若不是,请升级到较新版本。
MySQL驱动
5.1.34可用,最低不能低于5.1.13
修改MySQL配置文件
修改mysql配置文件my.cnf(windows为my.ini)
my.cnf一般在etc/mysql/my.cnf位置。找到后请在以下三部分里添加如下内容:[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect=’SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci’
SET NAMES 'utf8mb4'意思
它相当于下面的三句指令:
SET character_set_client = utf8mb4;
SET character_set_results = utf8mb4;
SET character_set_connection = utf8mb4;
重启数据库,检查变量SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';
数据如下:键值
character_set_clientutf8mb4
character_set_connectionutf8mb4
character_set_databaseutf8mb4
character_set_filesystembinary
character_set_resultsutf8mb4
character_set_serverutf8mb4
character_set_systemutf8
character_sets_dir/www/server/mysql/share/charsets/
collation_connectionutf8mb4_general_ci
collation_databaseutf8mb4_general_ci
collation_serverutf8mb4_general_ci
字段解释:
系统变量描述character_set_client(客户端来源数据使用的字符集)
character_set_connection(连接层字符集)
character_set_database(当前选中数据库的默认字符集)
character_set_results(查询结果字符集)
character_set_server(默认的内部操作字符集)
以上变量必须是:utf8mb4
mysql的字符集的作用域有三个层级一个数据库级,一个是表级,一个是列级(字段级别的)。
优先级是:列级>表级>数据库级。从优先级知道如果存储偏僻字那个字段不是utf8mb4字符集,那么即使数据库是utf8mb4也是无济于事的。
修改数据库和数据表对应的编码格式
1. 修改数据库编码方式ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# 修改完可以使用如下命令检查修改結果:
show variables like 'character_set_database';
database_name对应的数据库名字
2. 修改数据表编码方式ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
修改完可以使用如下命令检查修改数据表的結果:
show create table table_name
table_name对应的数据表的名字
参考:
https://www.itread01.com/content/1543644368.html
https://blog.csdn.net/wxq1075110242/article/details/89308815