Yar*_*rik 8 python parsing nlp nltk stanford-nlp
使用NLTK的StanfordParser,我可以解析这样一句话:
os.environ['STANFORD_PARSER'] = 'C:\jars'
os.environ['STANFORD_MODELS'] = 'C:\jars'
os.environ['JAVAHOME'] ='C:\ProgramData\Oracle\Java\javapath'
parser = stanford.StanfordParser(model_path="C:\jars\englishPCFG.ser.gz")
sentences = parser.parse(("bring me a red ball",))
for sentence in sentences:
sentence
Run Code Online (Sandbox Code Playgroud)
结果是:
Tree('ROOT', [Tree('S', [Tree('VP', [Tree('VB', ['Bring']),
Tree('NP', [Tree('DT', ['a']), Tree('NN', ['red'])]), Tree('NP',
[Tree('NN', ['ball'])])]), Tree('.', ['.'])])])
Run Code Online (Sandbox Code Playgroud)
除了上图之外,我如何使用Stanford解析器获取类型化的依赖项?就像是:
NLTK的StanfordParser模块(目前)没有将树包装到Stanford Dependencies转换代码中.您可以使用我的库PyStanfordDependencies,它包装依赖项转换器.
如果nltk_tree
是sentence
从问题的代码片段,那么这个工作:
#!/usr/bin/python3
import StanfordDependencies
# Use str() to convert the NLTK tree to Penn Treebank format
penn_treebank_tree = str(nltk_tree)
sd = StanfordDependencies.get_instance(jar_filename='point to Stanford Parser JAR file')
converted_tree = sd.convert_tree(penn_treebank_tree)
# Print Typed Dependencies
for node in converted_tree:
print('{}({}-{},{}-{})'.format(
node.deprel,
converted_tree[node.head - 1].form if node.head != 0 else 'ROOT',
node.head,
node.form,
node.index))
Run Code Online (Sandbox Code Playgroud)