Vic*_*ang 5 nlp corpus nltk tagged-corpus python-3.x
我刚刚关注了NLTK第5章,tagged_words()中的'simplify_tags'参数似乎是出乎意料的.我使用Python 3.4,PyCharm和标准NLTK包.
In[4]: nltk.corpus.brown.tagged_words()
Out[4]: [('The', 'AT'), ('Fulton', 'NP-TL'), ...]
In[5]: nltk.corpus.brown.tagged_words(simplify_tags = True)
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-c4f914e3e846>", line 1, in <module>
nltk.corpus.brown.tagged_words(simplify_tags = True)
TypeError: tagged_words() got an unexpected keyword argument 'simplify_tags'
Run Code Online (Sandbox Code Playgroud)
没有simplify_tags运行此函数没有问题.我感谢任何建议或意见.谢谢!
是的,如上所述,简化标签的最新版本是将它们映射到通用标签集(https://code.google.com/p/universal-pos-tags/).
>>> from nltk.corpus import brown
>>> brown.tagged_words(tagset='universal')
[(u'The', u'DET'), (u'Fulton', u'NOUN'), ...]
>>> brown.tagged_words(tagset='universal')[:10]
[(u'The', u'DET'), (u'Fulton', u'NOUN'), (u'County', u'NOUN'), (u'Grand', u'ADJ'), (u'Jury', u'NOUN'), (u'said', u'VERB'), (u'Friday', u'NOUN'), (u'an', u'DET'), (u'investigation', u'NOUN'), (u'of', u'ADP')]
Run Code Online (Sandbox Code Playgroud)
但请注意,仍有一个具有simplify_tags参数的语料库阅读器,请参阅https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/ipipan.py#L23
可能它正在为ipipan语料库阅读器移动到通用标签集.
另外,请注意并非所有语料库阅读器都能够映射到unviersal标记集,有些是在TODO列表中,例如https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/tagged的.py#L260
| 归档时间: |
|
| 查看次数: |
2346 次 |
| 最近记录: |