Jus*_*n S 7 python postgresql encoding utf-8
我已经对这个错误进行了大量的谷歌搜索,并将其归结为我正在使用的数据库采用不同的编码.
我正在使用的AIX服务器正在运行
psql 8.2.4
server_encoding | LATIN1 | | Client Connection Defaults / Locale and Formatting | Sets the server (database) character set encoding.
Run Code Online (Sandbox Code Playgroud)
我正在使用的Windows 2008 R2服务器正在运行
psql(9.3.4)
CREATE DATABASE postgres
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'English_Australia.1252'
LC_CTYPE = 'English_Australia.1252'
CONNECTION LIMIT = -1;
COMMENT ON DATABASE postgres
IS 'default administrative connection database';
Run Code Online (Sandbox Code Playgroud)
现在,当我尝试执行我的下面的python脚本时,我得到了这个错误
Traceback (most recent call last):
File "datamain.py", line 39, in <module>
sys.exit(main())
File "datamain.py", line 33, in main
write_file_to_table("cms_jobdef.txt", "cms_jobdef", con_S104838)
File "datamain.py", line 21, in write_file_to_table
cur.copy_from(f, table, ",")
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xa0
CONTEXT: COPY cms_jobdef, line 15209
Run Code Online (Sandbox Code Playgroud)
这是我的剧本
import psycopg2
import StringIO
import sys
import pdb
def connect_db(db, usr, pw, hst, prt):
conn = psycopg2.connect(database=db, user=usr,
password=pw, host=hst, port=prt)
return conn
def write_table_to_file(file, table, connection):
f = open(file, "w")
cur = connection.cursor()
cur.copy_to(f, table, ",")
f.close()
cur.close()
def write_file_to_table(file, table, connection):
f = open(file,"r")
cur = connection.cursor()
cur.copy_from(f, table, ",")
f.close()
cur.close()
def main():
login = open('login.txt','r')
con_tctmsv64 = connect_db("x", "y",
login.readline().strip(),
"d.domain", "c")
con_S104838 = connect_db("x", "y", "z", "a", "b")
try:
write_table_to_file("cms_jobdef.txt", "cms_jobdef", con_tctmsv64)
write_file_to_table("cms_jobdef.txt", "cms_jobdef", con_S104838)
finally:
con_tctmsv64.close()
con_S104838.close()
if __name__ == "__main__":
sys.exit(main())
Run Code Online (Sandbox Code Playgroud)
删除了一些敏感数据.
所以我不确定如何继续下去.据我所知,copy_expert方法可能有助于导出为UTF8编码.但是因为我从中提取数据的服务器正在运行8.2.4我不认为它支持COPY编码格式.
我认为我最好的尝试是尝试在Windows服务器上重新安装带有LATIN1编码的postgre数据库.当我尝试这样做时,我得到以下错误.

所以我很卡住,任何帮助将不胜感激!
更新我通过将默认本地更改为"C"将Windows上的postgre数据库安装为LATIN1编码.然而,这给了我以下错误,并没有看起来像一个可能成功/正确的方法

我也尝试使用PSQL COPY函数在BINARY中编码文件
def write_table_to_file(file, table, connection):
f = open(file, "w")
cur = connection.cursor()
#cur.copy_to(f, table, ",")
cur.copy_expert("COPY cms_jobdef TO STDOUT WITH BINARY", f)
f.close()
cur.close()
def write_file_to_table(file, table, connection):
f = open(file,"r")
cur = connection.cursor()
#cur.copy_from(f, table)
cur.copy_expert("COPY cms_jobdef FROM STDOUT WITH BINARY", f)
f.close()
cur.close()
Run Code Online (Sandbox Code Playgroud)
仍然没有运气我得到同样的错误
DataError: invalid byte sequence for encoding "UTF8": 0xa0
CONTEXT: COPY cms_jobdef, line 15209, column descript
Run Code Online (Sandbox Code Playgroud)
关于菲尔斯的答案,我尝试过这种方法仍然没有成功.
import psycopg2
import StringIO
import sys
import pdb
import codecs
def connect_db(db, usr, pw, hst, prt):
conn = psycopg2.connect(database=db, user=usr,
password=pw, host=hst, port=prt)
return conn
def write_table_to_file(file, table, connection):
f = open(file, "w")
#fx = codecs.EncodedFile(f,"LATIN1", "UTF8")
cur = connection.cursor()
cur.execute("SHOW client_encoding;")
print cur.fetchone()
cur.copy_to(f, table)
#cur.copy_expert("COPY cms_jobdef TO STDOUT WITH BINARY", f)
f.close()
cur.close()
def write_file_to_table(file, table, connection):
f = open(file,"r")
cur = connection.cursor()
cur.execute("SET CLIENT_ENCODING TO 'LATIN1';")
cur.execute("SHOW client_encoding;")
print cur.fetchone()
cur.copy_from(f, table)
#cur.copy_expert("COPY cms_jobdef FROM STDOUT WITH BINARY", f)
f.close()
cur.close()
def main():
login = open('login.txt','r')
con_tctmsv64 = connect_db("x", "y",
login.readline().strip(),
"ctmtest1.int.corp.sun", "5436")
con_S104838 = connect_db("x", "y", "z", "t", "5432")
try:
write_table_to_file("cms_jobdef.txt", "cms_jobdef", con_tctmsv64)
write_file_to_table("cms_jobdef.txt", "cms_jobdef", con_S104838)
finally:
con_tctmsv64.close()
con_S104838.close()
if __name__ == "__main__":
sys.exit(main())
Run Code Online (Sandbox Code Playgroud)
产量
In [4]: %run datamain.py
('sql_ascii',)
('LATIN1',)
In [5]:
Run Code Online (Sandbox Code Playgroud)
这成功完成但是当我运行时
select * from cms_jobdef;
Run Code Online (Sandbox Code Playgroud)
新数据库中没有任何内容

我甚至尝试将文件格式从LATIN1转换为UTF8.仍然没有运气
奇怪的是当我通过仅使用postgre COPY函数手动完成此过程时它可以工作.我不知道为什么.再一次,任何帮助将不胜感激.
事实证明,有几个选项可以解决这个问题.
更改Phil建议的客户端编码的选项确实有效.
cur.execute("SET CLIENT_ENCODING TO 'LATIN1';")
Run Code Online (Sandbox Code Playgroud)
另一种选择是即时转换数据.我使用了一个名为codecs的python模块来执行此操作.
f = open(file, "w")
fx = codecs.EncodedFile(f,"LATIN1", "UTF8")
cur = connection.cursor()
cur.execute("SHOW client_encoding;")
print cur.fetchone()
cur.copy_to(fx, table)
Run Code Online (Sandbox Code Playgroud)
关键是
fx = codecs.EncodedFile(f,"LATIN1", "UTF8")
Run Code Online (Sandbox Code Playgroud)
我的主要问题是我没有将更改提交到数据库!傻我:)
我正在从 SQL_ASCII 数据库迁移到 UTF8 数据库,并遇到了同样的问题。基于这个答案,我只是将此语句添加到导入脚本的开头:
set client_encoding to 'latin1'
Run Code Online (Sandbox Code Playgroud)
一切似乎都已正确导入。
| 归档时间: |
|
| 查看次数: |
9089 次 |
| 最近记录: |