小编And*_*rea的帖子

在Python中使用Stanford Tregex

我是NLP和Python的新手。我正在尝试使用Tregex工具和Python子进程库从StanfordCoreNLP的已解析树中提取名词短语的子集。特别是,我试图查找和提取与以下模式匹配的名词短语:'(NP [$ VP]> S)|(NP [$ VP]> S \ n)|(NP \ n [$ VP] > S)|(NP \ n [$ VP]> S \ n)'在Tregex语法中。

例如,下面是原始文本,保存在名为“ text”的字符串中:

text = ('Pusheen and Smitha walked along the beach. "I want to surf", said Smitha, the CEO of Tesla. However, she fell off the surfboard')
Run Code Online (Sandbox Code Playgroud)

使用Python包装器运行StanfordCoreNLP解析器后,我为这3个句子得到了以下3棵树:

output1['sentences'][0]['parse']

Out[58]: '(ROOT\n  (S\n    (NP (NNP Pusheen)\n      (CC and)\n      (NNP Smitha))\n    (VP (VBD walked)\n      (PP (IN along)\n        (NP (DT the) (NN beach))))\n    (. .)))'

output1['sentences'][1]['parse']

Out[59]: "(ROOT\n  (SINV (`` ``)\n …
Run Code Online (Sandbox Code Playgroud)

python parsing subprocess pattern-matching stanford-nlp

4
推荐指数
1
解决办法
1518
查看次数