忽略Unicode错误

Question

忽略Unicode错误

当我在一堆URL上运行循环以查找这些页面上的所有链接(在某些Div中)时,我得到了这个错误:

Traceback (most recent call last):
File "file_location", line 38, in <module>
out.writerow(tag['href'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 0: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

我写的与此错误相关的代码是:

out  = csv.writer(open("file_location", "ab"), delimiter=";")
for tag in soup_3.findAll('a', href=True):   
    out.writerow(tag['href'])

Run Code Online (Sandbox Code Playgroud)

有没有办法解决这个问题,可能使用if语句来忽略任何有Unicode错误的URL？

在此先感谢您的帮助.

Answer 1

Woo*_*ble 6

您可以将writerow方法调用包装在a中try并捕获异常以忽略它:

for tag in soup_3.findAll('a', href=True):
    try:
        out.writerow(tag['href'])
    except UnicodeEncodeError:
        pass

Run Code Online (Sandbox Code Playgroud)

但你几乎肯定想为你的CSV文件选择ASCII以外的编码(utf-8,除非你有充分的理由使用别的东西),并用它来codecs.open()代替内置打开它open.

归档时间：	14 年，4 月前
查看次数：	8795 次
最近记录：	14 年，4 月前