sta*_*alk 15 python django urlencode urlparse
urlparse.parse_qs对于解析url参数非常有用,它可以通过简单的ASCII url工作str.所以我可以解析一个查询,然后使用urllib.urlencode解析后的数据构建相同的路径:
>>> import urlparse
>>> import urllib
>>>
>>> path = '/?key=value' #path is str
>>> query = urlparse.urlparse(path).query
>>> query
'key=value'
>>> query_dict = urlparse.parse_qs(query)
>>> query_dict
{'key': ['value']}
>>> '/?' + urllib.urlencode(query_dict, doseq=True)
'/?key=value' # <-- path is the same here
Run Code Online (Sandbox Code Playgroud)
当url包含百分比编码的非ASCII参数时,它也可以正常工作:
>>> value = urllib.quote(u'????????'.encode('utf8'))
>>> value
'%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> path = '/?key=%s' % value
>>> path
'/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> query = urlparse.urlparse(path).query
>>> query
'key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> query_dict = urlparse.parse_qs(query)
>>> query_dict
{'key': ['\xd0\xb7\xd0\xbd\xd0\xb0\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5']}
>>> '/?' + urllib.urlencode(query_dict, doseq=True)
'/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' # <-- path is the same here
Run Code Online (Sandbox Code Playgroud)
但是,当使用django时,我会使用url request.get_full_path(),并将路径返回为unicode字符串:
>>> path = request.get_full_path()
>>> path
u'/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' # path is unicode
Run Code Online (Sandbox Code Playgroud)
看看现在会发生什么:
>>> query = urlparse.urlparse(path).query
>>> query
u'key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> query_dict = urlparse.parse_qs(query)
>>> query_dict
{u'key': [u'\xd0\xb7\xd0\xbd\xd0\xb0\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5']}
>>>
Run Code Online (Sandbox Code Playgroud)
query_dict包含unicode字符串,包含字节!不是unicode点!当然,当我尝试urlencode该字符串时,我有一个UnicodeEncodeError:
>>> urllib.urlencode(query_dict, doseq=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\Lib\urllib.py", line 1337, in urlencode
l.append(k + '=' + quote_plus(str(elt)))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-15: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
目前我有一个解决方案:
# just convert path, returned by request.get_full_path(), to `str` explicitly:
path = str(request.get_full_path())
Run Code Online (Sandbox Code Playgroud)
所以问题是:
Mar*_*ers 16
在.parse_qs()使用ASCII 之前将代码转换回字节,然后传递给它:
query_dict = urlparse.parse_qs(query.encode('ASCII'))
Run Code Online (Sandbox Code Playgroud)
这str()与使用显式编码完全相同.是的,这是安全的,URL编码仅使用ASCII码点.
parse_qs被赋予了一个Unicode值,所以它也返回了一个unicode值; 解码字节不是它的工作.
| 归档时间: |
|
| 查看次数: |
10988 次 |
| 最近记录: |