And*_*tin 5 python sql-server unicode pyodbc pypyodbc
我可以通过pypyodbc在python中发送查询来从MSSQL数据库中读取.
大多数unicode字符处理正确,但我遇到了一个导致错误的特定字符.
有问题的字段是类型,nvarchar(50)并以这个字符""开头,这对我来说有点像这样......
-----
|100|
|111|
-----
Run Code Online (Sandbox Code Playgroud)
如果该数字是十六进制,0x100111那么它就是角色supplementary private use area-b u+100111.虽然有趣的是,如果它是二进制的,0b100111那么它是一个撇号,可能是在上传数据时使用了错误的编码吗?该字段存储中文邮政地址的一部分.
错误消息包括
UnicodeDecodeError:'utf16'编解码器无法解码位置0-1中的字节:意外的数据结束
在这里它是完整的......
Traceback (most recent call last): File "question.py", line 19, in <module>
results.fetchone() File "/VIRTUAL_ENVIRONMENT_DIR/local/lib/python2.7/site-packages/pypyodbc.py", line 1869, in fetchone
value_list.append(buf_cvt_func(from_buffer_u(alloc_buffer))) File "/VIRTUAL_ENVIRONMENT_DIR/local/lib/python2.7/site-packages/pypyodbc.py", line 482, in UCS_dec
uchar = buffer.raw[i:i + ucs_length].decode(odbc_decoding) File "/VIRTUAL_ENVIRONMENT_DIR/lib/python2.7/encodings/utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data
Run Code Online (Sandbox Code Playgroud)
这是一些最小的再现代码......
import pypyodbc
connection_string = (
"DSN=sqlserverdatasource;"
"UID=REDACTED;"
"PWD=REDACTED;"
"DATABASE=obi_load")
connection = pypyodbc.connect(connection_string)
cursor = connection.cursor()
query_sql = (
"SELECT address_line_1 "
"FROM address "
"WHERE address_id == 'REDACTED' ")
with cursor.execute(query_sql) as results:
row = results.fetchone() # This is the line that raises the error.
print row
Run Code Online (Sandbox Code Playgroud)
这是我的一大块 /etc/freetds/freetds.conf
[global]
; tds version = 4.2
; dump file = /tmp/freetds.log
; debug flags = 0xffff
; timeout = 10
; connect timeout = 10
text size = 64512
[sqlserver]
host = REDACTED
port = 1433
tds version = 7.0
client charset = UTF-8
Run Code Online (Sandbox Code Playgroud)
我也尝试过client charset = UTF-16并省略了这一行.
这是我的相关部分 /etc/odbc.ini
[sqlserverdatasource]
Driver = FreeTDS
Description = ODBC connection via FreeTDS
Trace = No
Servername = sqlserver
Database = REDACTED
Run Code Online (Sandbox Code Playgroud)
这是我的相关部分 /etc/odbcinst.ini
[FreeTDS]
Description = TDS Driver (Sybase/MS SQL)
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPTimeout =
CPReuse =
UsageCount = 1
Run Code Online (Sandbox Code Playgroud)
我可以通过在try/except块中获取结果来解决这个问题,抛弃任何引发的行UnicodeDecodeError,但是有解决方案吗?我可以丢弃不可解码的字符,还是有办法获取此行而不会引发错误?
一些不良数据最终落在数据库上并不是不可想象的.
我用Google搜索了这个网站的相关问题,但没有运气.