pai*_*247 8 python csv url beautifulsoup utf-8
我不完全确定我需要对此错误做些什么.我认为它与需要添加.encode('utf-8')有关.但我不完全确定这是我需要做的,也不应该在哪里应用.
错误是:
line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
7: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
这是我的python脚本的基础.
import csv
from BeautifulSoup import BeautifulSoup
url = \
'https://dummysite'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', {'class': 'table'})
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./test.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Location"])
writer.writerows(list_of_rows)
Run Code Online (Sandbox Code Playgroud)
Ala*_*ack 21
Python 2.x CSV库已损坏.你有三个选择.按复杂程度排列:
编辑:见下文使用固定库https://github.com/jdunck/python-unicodecsv(pip install unicodecsv
).用作替代品 - 示例:
with open("myfile.csv", 'rb') as my_file:
r = unicodecsv.DictReader(my_file, encoding='utf-8')
Run Code Online (Sandbox Code Playgroud)阅读有关Unicode的CSV手册:https://docs.python.org/2/library/csv.html(参见底部的示例)
手动将每个项目编码为UTF-8:
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text.encode("utf-8"))
Run Code Online (Sandbox Code Playgroud)编辑,我发现读取UTF-16时python-unicodecsv也被破坏了.它抱怨任何0x00
字节.
相反,使用https://github.com/ryanhiebert/backports.csv,它更类似于Python 3实现并使用io
模块..
安装:
pip install backports.csv
Run Code Online (Sandbox Code Playgroud)
用法:
from backports import csv
import io
with io.open(filename, encoding='utf-8') as f:
r = csv.reader(f):
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
10170 次 |
最近记录: |