相关疑难解决方法(0)

对Unicode字符进行URL编码的正确方法是什么？

我知道非标准的%uxxxx方案,但这似乎不是明智的选择,因为该方案已被W3C拒绝.

一些有趣的例子:

心中的人物.如果我在浏览器中输入:

http://www.google.com/search?q=?

Run Code Online (Sandbox Code Playgroud)

然后复制并粘贴它,我看到这个URL

http://www.google.com/search?q=%E2%99%A5

Run Code Online (Sandbox Code Playgroud)

这使得它看起来像Firefox(或Safari)正在这样做.

urllib.quote_plus(x.encode("latin-1"))
'%E2%99%A5'

Run Code Online (Sandbox Code Playgroud)

这是有道理的,除了不能用Latin-1编码的东西,比如三点字符.

…

Run Code Online (Sandbox Code Playgroud)

如果我输入URL

http://www.google.com/search?q=…

Run Code Online (Sandbox Code Playgroud)

进入我的浏览器然后复制粘贴,我明白了

http://www.google.com/search?q=%E2%80%A6

Run Code Online (Sandbox Code Playgroud)

背部.这似乎是做的结果

urllib.quote_plus(x.encode("utf-8"))

Run Code Online (Sandbox Code Playgroud)

这是有道理的,因为...不能用Latin-1编码.

但后来我不清楚浏览器是如何用UTF-8或Latin-1解码的.

因为这似乎含糊不清:

In [67]: u"…".encode('utf-8').decode('latin-1')
Out[67]: u'\xc3\xa2\xc2\x80\xc2\xa6'

Run Code Online (Sandbox Code Playgroud)

有效,所以我不知道浏览器是如何用UTF-8或Latin-1解码的.

使用我需要处理的特殊字符做什么是正确的？

unicode urlencode web-standards utf-8 character-encoding

Jos*_*son

lucky-day

106
推荐指数

4
解决办法

10万
查看次数

在Python中将Unicode URL转换为ASCII(UTF-8%转义)的最佳方法？

我想知道什么是最好的方法 - 或者如果标准库有一个简单的方法 - 将域名和路径中的Unicode字符转换为等效的ASCII URL,使用域编码为IDNA和路径% -encoded,根据RFC 3986.

我从用户那里得到一个UTF-8的URL.因此,如果他们输入http://?.ws/?我'http://\xe2\x9e\xa1.ws/\xe2\x99\xa5'的Python.我想要的是ASCII版本:'http://xn--hgi.ws/%E2%99%A5'.

我现在所做的是通过正则表达式将URL拆分为多个部分,然后手动对域进行IDNA编码,并使用不同的urllib.quote()调用单独编码路径和查询字符串.

# url is UTF-8 here, eg: url = u'http://?.ws/?'.encode('utf-8')
match = re.match(r'([a-z]{3,5})://(.+\.[a-z0-9]{1,6})'
                 r'(:\d{1,5})?(/.*?)(\?.*)?$', url, flags=re.I)
if not match:
    raise BadURLException(url)
protocol, domain, port, path, query = match.groups()

try:
    domain = unicode(domain, 'utf-8')
except UnicodeDecodeError:
    return ''  # bad UTF-8 chars in domain
domain = domain.encode('idna')

if port is None:
    port = ''

path = urllib.quote(path)

if query is None:
    query = '' …

Run Code Online (Sandbox Code Playgroud)

python unicode url utf-8

Ben*_*oyt

lucky-day

28
推荐指数

2
解决办法

2万
查看次数