Python错误:'utf8'编解码器无法解码位置85中的字节0x92:无效的起始字节

Question

Python错误:'utf8'编解码器无法解码位置85中的字节0x92:无效的起始字节

我正在使用python2.7和lxml.我的代码如下

import urllib
from lxml import html

def get_value(el):
    return get_text(el, 'value') or el.text_content()

response = urllib.urlopen('http://www.edmunds.com/dealerships/Texas/Frisco/DavidMcDavidHondaofFrisco/fullsales-504210667.html').read()
dom = html.fromstring(response)

try:
    description = get_value(dom.xpath("//div[@class='description item vcard']")[0].xpath(".//p[@class='sales-review-paragraph loose-spacing']")[0])
except IndexError, e:
    description = ''

Run Code Online (Sandbox Code Playgroud)

代码在try中崩溃,给出错误

UnicodeDecodeError at /
'utf8' codec can't decode byte 0x92 in position 85: invalid start byte

Run Code Online (Sandbox Code Playgroud)

无法编码/解码的字符串是:ouldn t

我尝试过使用很多技术,包括.encode('utf8'),但没有一个能解决问题.我有2个问题:

如何解决这个问题呢
当问题代码介于try之外时,我的应用程序如何崩溃

Answer 1

Mar*_*cin 8

该页面正在提供charset=ISO-8859-1.从那解码到unicode.

[ 来自浏览器的详细信息快照. 信用@Old Panda]

归档时间：	13 年，9 月前
查看次数：	7463 次
最近记录：	9 年，9 月前