使用NLTK将POS标记的词语解释?

asc*_*Pig 16 python nlp nltk

我有POS用nltk.pos_tag()标记了一些单词,因此它们被赋予了treebank标签.我想使用已知的POS标签对这些词进行词形变换,但我不确定如何.我正在看Wordnet lemmatizer,但我不知道如何将树库POS标签转换为lemmatizer接受的标签.我怎样才能简单地执行这种转换,或者是否有使用树库标签的变形器?

rma*_*ouf 29

wordnet lemmatizer只知道四个词性(ADJ,ADV,NOUN和VERB),只有NOUN和VERB规则做任何特别有趣的事情.树库标签集中的名词词性全部以NN开头,动词标签全部以VB开头,形容词标签以JJ开头,副词标签以RB开头.因此,从一组标签转换到另一组标签非常简单,例如:

from nltk.corpus import wordnet

morphy_tag = {'NN':wordnet.NOUN,'JJ':wordnet.ADJ,'VB':wordnet.VERB,'RB':wordnet.ADV}[penn_tag[:2]]
Run Code Online (Sandbox Code Playgroud)


Art*_*lis 6

正如 @engineercoding 在对 @rmalouf 的回答的评论中指出的那样,与 WordNet 相比,Treebank 中的标签要多得多,请参阅此处了解详细信息

\n\n

下面的映射覆盖了尽可能多的碱基,它还明确定义了 WordNet 中不匹配的 POS 标签:

\n\n
# Create a map between Treebank and WordNet \nfrom nltk.corpus import wordnet as wn\n\n# WordNet POS tags are: NOUN = \'n\', ADJ = \'s\', VERB = \'v\', ADV = \'r\', ADJ_SAT = \'a\'\n# Descriptions (c) https://web.stanford.edu/~jurafsky/slp3/10.pdf\ntag_map = {\n        \'CC\':None, # coordin. conjunction (and, but, or)  \n        \'CD\':wn.NOUN, # cardinal number (one, two)             \n        \'DT\':None, # determiner (a, the)                    \n        \'EX\':wn.ADV, # existential \xe2\x80\x98there\xe2\x80\x99 (there)           \n        \'FW\':None, # foreign word (mea culpa)             \n        \'IN\':wn.ADV, # preposition/sub-conj (of, in, by)   \n        \'JJ\':[wn.ADJ, wn.ADJ_SAT], # adjective (yellow)                  \n        \'JJR\':[wn.ADJ, wn.ADJ_SAT], # adj., comparative (bigger)          \n        \'JJS\':[wn.ADJ, wn.ADJ_SAT], # adj., superlative (wildest)           \n        \'LS\':None, # list item marker (1, 2, One)          \n        \'MD\':None, # modal (can, should)                    \n        \'NN\':wn.NOUN, # noun, sing. or mass (llama)          \n        \'NNS\':wn.NOUN, # noun, plural (llamas)                  \n        \'NNP\':wn.NOUN, # proper noun, sing. (IBM)              \n        \'NNPS\':wn.NOUN, # proper noun, plural (Carolinas)\n        \'PDT\':[wn.ADJ, wn.ADJ_SAT], # predeterminer (all, both)            \n        \'POS\':None, # possessive ending (\xe2\x80\x99s )               \n        \'PRP\':None, # personal pronoun (I, you, he)     \n        \'PRP$\':None, # possessive pronoun (your, one\xe2\x80\x99s)    \n        \'RB\':wn.ADV, # adverb (quickly, never)            \n        \'RBR\':wn.ADV, # adverb, comparative (faster)        \n        \'RBS\':wn.ADV, # adverb, superlative (fastest)     \n        \'RP\':[wn.ADJ, wn.ADJ_SAT], # particle (up, off)\n        \'SYM\':None, # symbol (+,%, &)\n        \'TO\':None, # \xe2\x80\x9cto\xe2\x80\x9d (to)\n        \'UH\':None, # interjection (ah, oops)\n        \'VB\':wn.VERB, # verb base form (eat)\n        \'VBD\':wn.VERB, # verb past tense (ate)\n        \'VBG\':wn.VERB, # verb gerund (eating)\n        \'VBN\':wn.VERB, # verb past participle (eaten)\n        \'VBP\':wn.VERB, # verb non-3sg pres (eat)\n        \'VBZ\':wn.VERB, # verb 3sg pres (eats)\n        \'WDT\':None, # wh-determiner (which, that)\n        \'WP\':None, # wh-pronoun (what, who)\n        \'WP$\':None, # possessive (wh- whose)\n        \'WRB\':None, # wh-adverb (how, where)\n        \'$\':None, #  dollar sign ($)\n        \'#\':None, # pound sign (#)\n        \'\xe2\x80\x9c\':None, # left quote (\xe2\x80\x98 or \xe2\x80\x9c)\n        \'\xe2\x80\x9d\':None, # right quote (\xe2\x80\x99 or \xe2\x80\x9d)\n        \'(\':None, # left parenthesis ([, (, {, <)\n        \')\':None, # right parenthesis (], ), }, >)\n        \',\':None, # comma (,)\n        \'.\':None, # sentence-final punc (. ! ?)\n        \':\':None # mid-sentence punc (: ; ... \xe2\x80\x93 -)\n    }\n
Run Code Online (Sandbox Code Playgroud)\n