gio*_*o79 8 grammar nlp clause nltk stanford-nlp
给定一个NLP解析树
(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))
Run Code Online (Sandbox Code Playgroud)
原来的句子是"你可以说他们经常洗澡,这增加了他们的兴奋和生活乐趣."
如何提取和逆向设计条款?我们将分裂为S和SBAR(以保留子句的类型,例如从属)
- (S (NP (PRP You)) (VP (MD could) (VP (VB say)
- (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower))
- (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to)
(NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW
de) (FW vivre))))))))))))) (. .)))
Run Code Online (Sandbox Code Playgroud)
到达
- You could say
- that they regularly catch a shower
- , which adds to their exhilaration and joie de vivre.
Run Code Online (Sandbox Code Playgroud)
在S和SBAR分裂似乎很容易.问题似乎是从片段中剥离掉所有POS标签和块.
你可以用Tree.subtrees().有关更多信息,请查看NLTK树类.
码:
from nltk import Tree
parse_str = "(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))"
#parse_str = "(ROOT (S (SBAR (IN Though) (S (NP (PRP he)) (VP (VBD was) (ADJP (RB very) (JJ rich))))) (, ,) (NP (PRP he)) (VP (VBD was) (ADVP (RB still)) (ADJP (RB very) (JJ unhappy))) (. .)))"
t = Tree.fromstring(parse_str)
#print t
subtexts = []
for subtree in t.subtrees():
if subtree.label()=="S" or subtree.label()=="SBAR":
#print subtree.leaves()
subtexts.append(' '.join(subtree.leaves()))
#print subtexts
presubtexts = subtexts[:] # ADDED IN EDIT for leftover check
for i in reversed(range(len(subtexts)-1)):
subtexts[i] = subtexts[i][0:subtexts[i].index(subtexts[i+1])]
for text in subtexts:
print text
# ADDED IN EDIT - Not sure for generalized cases
leftover = presubtexts[0][presubtexts[0].index(presubtexts[1])+len(presubtexts[1]):]
print leftover
Run Code Online (Sandbox Code Playgroud)
输出:
You could say
that
they regularly catch a shower ,
which
adds to their exhilaration and joie de vivre
.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3147 次 |
| 最近记录: |