我有POS用nltk.pos_tag()标记了一些单词,因此它们被赋予了treebank标签.我想使用已知的POS标签对这些词进行词形变换,但我不确定如何.我正在看Wordnet lemmatizer,但我不知道如何将树库POS标签转换为lemmatizer接受的标签.我怎样才能简单地执行这种转换,或者是否有使用树库标签的变形器?
rma*_*ouf 29
wordnet lemmatizer只知道四个词性(ADJ,ADV,NOUN和VERB),只有NOUN和VERB规则做任何特别有趣的事情.树库标签集中的名词词性全部以NN开头,动词标签全部以VB开头,形容词标签以JJ开头,副词标签以RB开头.因此,从一组标签转换到另一组标签非常简单,例如:
from nltk.corpus import wordnet
morphy_tag = {'NN':wordnet.NOUN,'JJ':wordnet.ADJ,'VB':wordnet.VERB,'RB':wordnet.ADV}[penn_tag[:2]]
Run Code Online (Sandbox Code Playgroud)
正如 @engineercoding 在对 @rmalouf 的回答的评论中指出的那样,与 WordNet 相比,Treebank 中的标签要多得多,请参阅此处了解详细信息。
\n\n下面的映射覆盖了尽可能多的碱基,它还明确定义了 WordNet 中不匹配的 POS 标签:
\n\n# Create a map between Treebank and WordNet \nfrom nltk.corpus import wordnet as wn\n\n# WordNet POS tags are: NOUN = \'n\', ADJ = \'s\', VERB = \'v\', ADV = \'r\', ADJ_SAT = \'a\'\n# Descriptions (c) https://web.stanford.edu/~jurafsky/slp3/10.pdf\ntag_map = {\n \'CC\':None, # coordin. conjunction (and, but, or) \n \'CD\':wn.NOUN, # cardinal number (one, two) \n \'DT\':None, # determiner (a, the) \n \'EX\':wn.ADV, # existential \xe2\x80\x98there\xe2\x80\x99 (there) \n \'FW\':None, # foreign word (mea culpa) \n \'IN\':wn.ADV, # preposition/sub-conj (of, in, by) \n \'JJ\':[wn.ADJ, wn.ADJ_SAT], # adjective (yellow) \n \'JJR\':[wn.ADJ, wn.ADJ_SAT], # adj., comparative (bigger) \n \'JJS\':[wn.ADJ, wn.ADJ_SAT], # adj., superlative (wildest) \n \'LS\':None, # list item marker (1, 2, One) \n \'MD\':None, # modal (can, should) \n \'NN\':wn.NOUN, # noun, sing. or mass (llama) \n \'NNS\':wn.NOUN, # noun, plural (llamas) \n \'NNP\':wn.NOUN, # proper noun, sing. (IBM) \n \'NNPS\':wn.NOUN, # proper noun, plural (Carolinas)\n \'PDT\':[wn.ADJ, wn.ADJ_SAT], # predeterminer (all, both) \n \'POS\':None, # possessive ending (\xe2\x80\x99s ) \n \'PRP\':None, # personal pronoun (I, you, he) \n \'PRP$\':None, # possessive pronoun (your, one\xe2\x80\x99s) \n \'RB\':wn.ADV, # adverb (quickly, never) \n \'RBR\':wn.ADV, # adverb, comparative (faster) \n \'RBS\':wn.ADV, # adverb, superlative (fastest) \n \'RP\':[wn.ADJ, wn.ADJ_SAT], # particle (up, off)\n \'SYM\':None, # symbol (+,%, &)\n \'TO\':None, # \xe2\x80\x9cto\xe2\x80\x9d (to)\n \'UH\':None, # interjection (ah, oops)\n \'VB\':wn.VERB, # verb base form (eat)\n \'VBD\':wn.VERB, # verb past tense (ate)\n \'VBG\':wn.VERB, # verb gerund (eating)\n \'VBN\':wn.VERB, # verb past participle (eaten)\n \'VBP\':wn.VERB, # verb non-3sg pres (eat)\n \'VBZ\':wn.VERB, # verb 3sg pres (eats)\n \'WDT\':None, # wh-determiner (which, that)\n \'WP\':None, # wh-pronoun (what, who)\n \'WP$\':None, # possessive (wh- whose)\n \'WRB\':None, # wh-adverb (how, where)\n \'$\':None, # dollar sign ($)\n \'#\':None, # pound sign (#)\n \'\xe2\x80\x9c\':None, # left quote (\xe2\x80\x98 or \xe2\x80\x9c)\n \'\xe2\x80\x9d\':None, # right quote (\xe2\x80\x99 or \xe2\x80\x9d)\n \'(\':None, # left parenthesis ([, (, {, <)\n \')\':None, # right parenthesis (], ), }, >)\n \',\':None, # comma (,)\n \'.\':None, # sentence-final punc (. ! ?)\n \':\':None # mid-sentence punc (: ; ... \xe2\x80\x93 -)\n }\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
6175 次 |
| 最近记录: |