我需要将单词分类到他们的词性中.像动词,名词,副词等.我用过
nltk.word_tokenize() #to identify word in a sentence
nltk.pos_tag() #to identify the parts of speech
nltk.ne_chunk() #to identify Named entities.
Run Code Online (Sandbox Code Playgroud)
这是一棵树.例如
>>> sentence = "I am Jhon from America"
>>> sent1 = nltk.word_tokenize(sentence )
>>> sent2 = nltk.pos_tag(sent1)
>>> sent3 = nltk.ne_chunk(sent2, binary=True)
>>> sent3
Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])])
Run Code Online (Sandbox Code Playgroud)
访问此树中的元素时,我按如下方式执行:
>>> sent3[0]
('I', 'PRP')
>>> sent3[0][0]
'I'
>>> sent3[0][1]
'PRP'
Run Code Online (Sandbox Code Playgroud)
但是在访问命名实体时:
>>> sent3[2]
Tree('NE', [('Jhon', 'NNP')])
>>> sent3[2][0]
('Jhon', 'NNP')
>>> …Run Code Online (Sandbox Code Playgroud)