UnicodeWarning:Unicode相等比较无法将两个参数都转换为Unicode

fan*_*yna 1 python unicode

我知道很多人以前遇到过这个错误,但我找不到解决问题的方法.

我有一个我想要规范化的网址:

url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
Run Code Online (Sandbox Code Playgroud)

这会给出一条错误消息:

/usr/lib64/python2.6/urllib.py:1236: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  res = map(safe_map.__getitem__, s)
Traceback (most recent call last):
  File "url_normalization.py", line 246, in <module>
    logging.info(get_canonical_url(url))
  File "url_normalization.py", line 102, in get_canonical_url
    path = urllib.quote(path,safe="%/")
  File "/usr/lib64/python2.6/urllib.py", line 1236, in quote
    res = map(safe_map.__getitem__, s)
KeyError: u'\xc3'
Run Code Online (Sandbox Code Playgroud)

我试图从URL字符串中删除unicode指示符"u",我没有收到错误消息.但是我怎样才能自动摆脱unicode,因为我直接从数据库中读取它.

Jas*_*oks 5

urllib.quote()没有正确解析Unicode.要解决这个问题,您可以.encode()在读取时调用url上的方法(或者从数据库中读取的变量).所以跑url = url.encode('utf-8').有了这个你得到:

import urllib
import urlparse
from urlparse import urlsplit

url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
url = url.encode('utf-8')
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
Run Code Online (Sandbox Code Playgroud)

然后你的path变量输出将是:

>>> path
'/Dienste/Fachbeitr%C3%A4ge.aspx'
Run Code Online (Sandbox Code Playgroud)

这有用吗?