Sha*_*pLu 2 python csv file-read file-writing python-2.7
我有一个关于Python 2.7用' utf-8-sig'代码csv读写csv文件的问题。标头为
['\xef\xbb\xbfID;timestamp;CustomerID;Email']
Run Code Online (Sandbox Code Playgroud)
"\xef\xbb\xbfID"我从文件中读取了一些代码()A.csv,我想将相同的代码和标头写入文件B.csv
我的打印日志显示:
['\xef\xbb\xbfID;timestamp;CustomerID;Email']
Run Code Online (Sandbox Code Playgroud)
但是实际的输出文件头看起来像
ÔªøID;timestamp
Run Code Online (Sandbox Code Playgroud)
这是代码:
def remove_gdpr_info_from_csv(file_path, file_name, temp_folder, original_header):
new_temp_folder = tempfile.mkdtemp()
new_temp_file = new_temp_folder + "/" + file_name
# Blanked new file
with open(new_temp_file, 'wb') as outfile:
writer = csv.writer(outfile, delimiter=";")
print original_header
writer.writerow(original_header)
# File from SFTP
with open(file_path, 'r') as infile:
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
email = first_row.index('Email')
contract_detractor1 = first_row.index('Contact Detractor (Q21)')
contract_detractor2 = first_row.index('Contact Detractor (Q20)')
contract_detractor3 = first_row.index('Contact Detractor (Q43)')
contract_detractor4 = first_row.index('Contact Detractor(Q26)')
contract_detractor5 = first_row.index('Contact Detractor(Q27)')
contract_detractor6 = first_row.index('Contact Detractor(Q44)')
indexes = []
for column_name in header_list:
ind = first_row.index(column_name)
indexes.append(ind)
for row in reader:
output_row = []
for ind in indexes:
data = row[ind]
if ind == email:
data = ''
elif ind == contract_detractor1:
data = ''
elif ind == contract_detractor2:
data = ''
elif ind == contract_detractor3:
data = ''
elif ind == contract_detractor4:
data = ''
elif ind == contract_detractor5:
data = ''
elif ind == contract_detractor6:
data = ''
output_row.append(data)
writer.writerow(output_row)
s3core.upload_files(SPARKY_S3, DESTINATION_PATH, new_temp_file)
shutil.rmtree(temp_folder)
shutil.rmtree(new_temp_folder)
Run Code Online (Sandbox Code Playgroud)
'\xef\xbb\xbf'是ZERO WIDTH NO-BREAK SPACE U + FEFF的Unicode UTF8编码版本。它通常在unicode文本文件的开头用作字节顺序标记:
'\xef\xbb\xbf',则文件是utf8编码的'\xff\xfe',则文件位于utf16 little endian中'\xfe\xff',则文件位于utf16 big endian中该'utf-8-sig'编码明确地要求提供的文件的开头写这个BOM
要在Python 2中读取csv文件时自动处理它,可以使用编解码器模块:
with open(file_path, 'r') as infile:
reader = csv.reader(codecs.EncodedFile(infile, 'utf8-sig', 'utf8'), delimiter=";")
Run Code Online (Sandbox Code Playgroud)
EncodedFile会通过解码原始文件对象来包装原始文件对象utf8-sig,实际上会跳过BOM表并在utf8没有BOM的情况下重新编码它。
小智 5
您想使用库EncodedFile中的方法codecs,如 Serge Ballesta 的答案所示。
但是,使用 Python 2.7 时,编码utf-8-sig不是 UTF8-sig 编码支持的别名,您需要使用utf_8_sig. 此外,方法属性的顺序需要首先定义输出数据编码,然后定义文件编码:codecs.EncodedFile(file,datacodec,filecodec=None,errors=\xe2\x80\x99strict\')
这是完整的结果:
\n\nimport codecs\nwith open(file_path, \'r\') as infile:\n reader = csv.reader(codecs.EncodedFile(infile, \'utf8\', \'utf_8_sig\'), delimiter=";")\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
4257 次 |
| 最近记录: |