我是NLP和Python的新手。我正在尝试使用Tregex工具和Python子进程库从StanfordCoreNLP的已解析树中提取名词短语的子集。特别是,我试图查找和提取与以下模式匹配的名词短语:'(NP [$ VP]> S)|(NP [$ VP]> S \ n)|(NP \ n [$ VP] > S)|(NP \ n [$ VP]> S \ n)'在Tregex语法中。
例如,下面是原始文本,保存在名为“ text”的字符串中:
text = ('Pusheen and Smitha walked along the beach. "I want to surf", said Smitha, the CEO of Tesla. However, she fell off the surfboard')
Run Code Online (Sandbox Code Playgroud)
使用Python包装器运行StanfordCoreNLP解析器后,我为这3个句子得到了以下3棵树:
output1['sentences'][0]['parse']
Out[58]: '(ROOT\n (S\n (NP (NNP Pusheen)\n (CC and)\n (NNP Smitha))\n (VP (VBD walked)\n (PP (IN along)\n (NP (DT the) (NN beach))))\n (. .)))'
output1['sentences'][1]['parse']
Out[59]: "(ROOT\n (SINV (`` ``)\n …Run Code Online (Sandbox Code Playgroud)