Dar*_*zer 56 python string unicode
我有以下代码:
import string
def translate_non_alphanumerics(to_translate, translate_to='_'):
not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'
translate_table = string.maketrans(not_letters_or_digits,
translate_to
*len(not_letters_or_digits))
return to_translate.translate(translate_table)
Run Code Online (Sandbox Code Playgroud)
哪个适用于非unicode字符串:
>>> translate_non_alphanumerics('<foo>!')
'_foo__'
Run Code Online (Sandbox Code Playgroud)
但unicode字符串失败:
>>> translate_non_alphanumerics(u'<foo>!')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in translate_non_alphanumerics
TypeError: character mapping must return integer, None or unicode
Run Code Online (Sandbox Code Playgroud)
对于str.translate()方法,我无法理解Python 2.6.2文档中 "Unicode对象"的段落.
如何使这个工作适用于Unicode字符串?
Mik*_*ers 56
Unicode版本的translate需要从Unicode序列(您可以检索单个字符ord
)到Unicode序列的映射.如果要删除字符,请映射到None
.
我改变了你的函数来构建一个dict,将每个字符的序数映射到你想要翻译成的序数:
def translate_non_alphanumerics(to_translate, translate_to=u'_'):
not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'
translate_table = dict((ord(char), translate_to) for char in not_letters_or_digits)
return to_translate.translate(translate_table)
>>> translate_non_alphanumerics(u'<foo>!')
u'_foo__'
Run Code Online (Sandbox Code Playgroud)
编辑:事实证明,转换映射必须从Unicode序号(via ord
)映射到另一个Unicode序号,Unicode字符串或None(要删除).因此我将默认值更改为translate_to
Unicode文字.例如:
>>> translate_non_alphanumerics(u'<foo>!', u'bad')
u'badfoobadbad'
Run Code Online (Sandbox Code Playgroud)
在这个版本中你可以相对地给别人写一个字母
def trans(to_translate):
tabin = u'??????'
tabout = u'??????'
tabin = [ord(char) for char in tabin]
translate_table = dict(zip(tabin, tabout))
return to_translate.translate(translate_table)
Run Code Online (Sandbox Code Playgroud)
我想出了我的原始函数和Mike的版本的以下组合,它与Unicode和ASCII字符串一起使用:
def translate_non_alphanumerics(to_translate, translate_to=u'_'):
not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'
if isinstance(to_translate, unicode):
translate_table = dict((ord(char), unicode(translate_to))
for char in not_letters_or_digits)
else:
assert isinstance(to_translate, str)
translate_table = string.maketrans(not_letters_or_digits,
translate_to
*len(not_letters_or_digits))
return to_translate.translate(translate_table)
Run Code Online (Sandbox Code Playgroud)
更新:"强制" translate_to
为unicode的unicode translate_table
.谢谢迈克.
归档时间: |
|
查看次数: |
32562 次 |
最近记录: |