有没有办法从单词中正确删除时态或复数?

use*_*734 5 python nltk

是否可以使用 nltk 将 running、helps、cooks、finds 和 happy 等词更改为 run、help、cook、find 和 happy?

alv*_*vas 9

>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()

>>> ls = ['running', 'helping', 'cooks', 'finds']
>>> [wnl.lemmatize(i) for i in ls]
['running', 'helping', u'cook', u'find']

>>> ls = [('running', 'v'), ('helping', 'v'), ('cooks', 'v'), ('finds','v')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
[u'run', u'help', u'cook', u'find']

>>> ls = [('running', 'n'), ('helping', 'n'), ('cooks', 'n'), ('finds','n')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
['running', 'helping', u'cook', u'find']
Run Code Online (Sandbox Code Playgroud)

参见油炸的 Porter Stemming


Irs*_*hat 7

有一些词干算法在nltk. 看起来Lancaster词干算法对你有用。

>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem('happily')
'happy'
>>> st.stem('cooks')
'cook'
>>> st.stem('helping')
'help'
>>> st.stem('running')
'run'
>>> st.stem('finds')
'find'
Run Code Online (Sandbox Code Playgroud)