相关疑难解决方法(0)

如何让str.translate使用Unicode字符串?

我有以下代码:

import string
def translate_non_alphanumerics(to_translate, translate_to='_'):
    not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'
    translate_table = string.maketrans(not_letters_or_digits,
                                       translate_to
                                         *len(not_letters_or_digits))
    return to_translate.translate(translate_table)
Run Code Online (Sandbox Code Playgroud)

哪个适用于非unicode字符串:

>>> translate_non_alphanumerics('<foo>!')
'_foo__'
Run Code Online (Sandbox Code Playgroud)

但unicode字符串失败:

>>> translate_non_alphanumerics(u'<foo>!')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in translate_non_alphanumerics
TypeError: character mapping must return integer, None or unicode
Run Code Online (Sandbox Code Playgroud)

对于str.translate()方法,我无法理解Python 2.6.2文档中 "Unicode对象"的段落.

如何使这个工作适用于Unicode字符串?

python string unicode

56
推荐指数
3
解决办法
3万
查看次数

str.translate给出TypeError - Translate接受一个参数(给定2个),在Python 2中工作

我有以下代码

import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile

lmtzr = nltk.stem.wordnet.WordNetLemmatizer()

def sanitize(wordList): 
answer = [word.translate(None, string.punctuation) for word in wordList] 
answer = [lmtzr.lemmatize(word.lower()) for word in answer]
return answer

words = []
for filename in json_list:
    words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text'] 
                   for tweet in json.load(open(filename,READ))])))])
Run Code Online (Sandbox Code Playgroud)

我写的时候,我在一个单独的testing.py文件中测试过2-4行

import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile

wordList= ['\'the', 'the', '"the']
print wordList
wordList2 = [word.translate(None, string.punctuation) for word in wordList]
print wordList2
answer = [lmtzr.lemmatize(word.lower()) for word …
Run Code Online (Sandbox Code Playgroud)

python typeerror nltk

49
推荐指数
4
解决办法
6万
查看次数

标签 统计

python ×2

nltk ×1

string ×1

typeerror ×1

unicode ×1