小编And*_*rea的帖子

在Python中使用Stanford Tregex

我是NLP和Python的新手。我正在尝试使用Tregex工具和Python子进程库从StanfordCoreNLP的已解析树中提取名词短语的子集。特别是，我试图查找和提取与以下模式匹配的名词短语：'（NP [$ VP]> S）|（NP [$ VP]> S \ n）|（NP \ n [$ VP] > S）|（NP \ n [$ VP]> S \ n）'在Tregex语法中。

例如，下面是原始文本，保存在名为“ text”的字符串中：

text = ('Pusheen and Smitha walked along the beach. "I want to surf", said Smitha, the CEO of Tesla. However, she fell off the surfboard')

Run Code Online (Sandbox Code Playgroud)

使用Python包装器运行StanfordCoreNLP解析器后，我为这3个句子得到了以下3棵树：

output1['sentences'][0]['parse']

Out[58]: '(ROOT\n  (S\n    (NP (NNP Pusheen)\n      (CC and)\n      (NNP Smitha))\n    (VP (VBD walked)\n      (PP (IN along)\n        (NP (DT the) (NN beach))))\n    (. .)))'

output1['sentences'][1]['parse']

Out[59]: "(ROOT\n  (SINV (`` ``)\n …

Run Code Online (Sandbox Code Playgroud)

python parsing subprocess pattern-matching stanford-nlp

And*_*rea

2017 03-21

4
推荐指数

1
解决办法

1518
查看次数