string.translate()在python中使用unicode数据

ado*_*tyd 24 python unicode dictionary

我有3个API将json数据返回到3个字典变量.我从字典中取一些值来处理它们.我在列表中读到了我想要的具体值valuelist.其中一个步骤是从中删除标点符号.我通常string.translate(None, string.punctuation)用于此过程,但因为字典数据是unicode我得到错误:

    wordlist = [s.translate(None, string.punctuation)for s in valuelist]
TypeError: translate() takes exactly one argument (2 given)
Run Code Online (Sandbox Code Playgroud)

有没有解决的办法?通过编码unicode或替换string.translate

Sim*_*pin 32

translate方法在Unicode对象上的工作方式与在字节串对象上的工作方式不同:

>>> help(unicode.translate)

S.translate(table) -> unicode

Return a copy of the string S, where all characters have been mapped
through the given translation table, which must be a mapping of
Unicode ordinals to Unicode ordinals, Unicode strings or None.
Unmapped characters are left untouched. Characters mapped to None
are deleted.

所以你的例子将成为:

remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)
word_list = [s.translate(remove_punctuation_map) for s in value_list]
Run Code Online (Sandbox Code Playgroud)

但请注意,string.punctuation只包含ASCII标点符号.完整的Unicode有更多标点字符,但这一切都取决于您的用例.


ncu*_*tra 6

我注意到不推荐使用string.translate.由于您要删除标点符号,而不是实际翻译字符,因此可以使用re.sub函数.

    >>> import re

    >>> s1="this.is a.string, with; (punctuation)."
    >>> s1
    'this.is a.string, with; (punctuation).'
    >>> re.sub("[\.\t\,\:;\(\)\.]", "", s1, 0, 0)
    'thisis astring with punctuation'
    >>>
Run Code Online (Sandbox Code Playgroud)

  • 不推荐使用模块函数`string.translate`以支持方法`str.translate`,`translate`方法(OP正在使用)仍然可用. (3认同)