来自网站的文字显示为Gibberish而不是希伯来语

Question

来自网站的文字显示为Gibberish而不是希伯来语

oha*_*987 5 python unicode encoding utf-8 decoding

我正试图从网站上获取一个字符串.我使用请求模块发送GET请求.

text = requests.get("http://example.com") #send GET requests to the website
print text.text #print the variable

Run Code Online (Sandbox Code Playgroud)

但是,出于某种原因,文字出现在Gibberish而不是希伯来语中:

<div>
<p>×©×¨×ª</p>
</div>

Run Code Online (Sandbox Code Playgroud)

当我用Fiddler嗅到流量或在我的浏览器中查看网站时,我用希伯来语看到它:

<div>
<p>???</p>
</div>

Run Code Online (Sandbox Code Playgroud)

顺便说一句,html代码包含定义编码的元标记,即utf-8.我试图对文本进行编码,utf-8但仍然是乱码.我尝试使用utf-8它来取消它,但它会引发UnicodeEncodeError异常.我声明我正在使用utf-8脚本的第一行.而且,当我使用内置urllib模块发送请求时,问题也会发生.

我读了Unicode HOWTO,但仍无法修复它.我还在这里阅读了许多线程(关于UnicodeEncodeError异常以及为什么希伯来语在Python中变成乱码)但我仍然无法修复它.

我在Windows机器上使用Python 2.7.9.我在Python IDLE中运行我的脚本.

提前致谢.

Answer 1

Ign*_*ams 6

服务器未正确声明编码.

>>> print u'×©×¨×ª'.encode('latin-1').decode('utf-8')
???

Run Code Online (Sandbox Code Playgroud)

text.encoding访问前设置text.text.

text = requests.get("http://example.com") #send GET requests to the website
text.encoding = 'utf-8' # Correct the page encoding
print text.text #print the variable

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，6 月前
查看次数：	986 次
最近记录：	10 年，6 月前