简单的语法在Python中给出ValueError

Cha*_*iya 2 nlp nltk python-3.x

我是Python,nltk和nlp的新手.我写了简单的语法.但是在运行程序时,它会给出以下错误.请帮我解决这个错误

语法:-

S -> NP
NP -> PN|PRO|D[NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM
PP -> P NP
D[NUM=sg] -> 'a'
D -> 'the'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
NOM -> A NOM|N[NUM=?n]
Run Code Online (Sandbox Code Playgroud)

码:-

import nltk

grammar = nltk.data.load('file:english_grammer.cfg')
rdparser = nltk.RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)

for tree in trees: print (tree)
Run Code Online (Sandbox Code Playgroud)

错误:-

ValueError:预期非终结符,发现:[NUM =?n] N [NUM =?n] | D [NUM =?n] AN [NUM =?n] | D [NUM =?n] N [NUM =?n ] PP | QP N [NUM =?n] | AN [NUM =?n] | D [NUM =?n] NOM PP | D [NUM =?n] NOM

alv*_*vas 5

我不认为NLTK CFG语法阅读器可以用方括号读取你的CFG格式.

首先让我们尝试一个没有方括号的CFG语法:

from nltk.grammar import CFG

grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
'''

grammar = CFG.fromstring(grammar_string)
print grammar
Run Code Online (Sandbox Code Playgroud)

[OUT]:

Grammar with 18 productions (start state = S)
    S -> NP
    PP -> P NP
    D -> 'the'
    PN -> 'saumya'
    PN -> 'dinesh'
    PRO -> 'she'
    PRO -> 'he'
    PRO -> 'we'
    A -> 'tall'
    A -> 'naughty'
    A -> 'long'
    A -> 'three'
    A -> 'black'
    P -> 'with'
    P -> 'in'
    P -> 'from'
    P -> 'at'
    QP -> 'some'
Run Code Online (Sandbox Code Playgroud)

现在让我们把方括号放在:

from nltk.grammar import CFG

grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
'''

grammar = CFG.fromstring(grammar_string)
print grammar
Run Code Online (Sandbox Code Playgroud)

[OUT]:

Traceback (most recent call last):
  File "test.py", line 33, in <module>
    grammar = CFG.fromstring(grammar_string)
  File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring
    encoding=encoding)
  File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1273, in read_grammar
    (linenum+1, line, e))
ValueError: Unable to parse line 10: N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
Expected an arrow
Run Code Online (Sandbox Code Playgroud)

回到你的语法,似乎你使用方括号来表示约束或不相关,所以解决方案是:

  • 使用下划线表示对比非终端和
  • 为无约束的非终端制定规则

所以你的cfg规则看起来如下:

from nltk.parse import RecursiveDescentParser
from nltk.grammar import CFG

grammar_string = '''
S -> NP
NP -> PN | PRO | D N | D A N | D N PP | QP N | A N | D NOM PP | D NOM

PP -> P NP
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'

D -> D_def | D_sg
D_def -> 'the'
D_sg -> 'a'

N -> N_sg | N_pl
N_sg -> 'boy'|'girl'|'room'|'garden'|'hair'
N_pl -> 'dogs'|'cats'
'''

grammar = CFG.fromstring(grammar_string)

rdparser = RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)

for tree in trees:
    print (tree)
Run Code Online (Sandbox Code Playgroud)

[OUT]:

(S (NP (D (D_sg a)) (N (N_pl dogs))))
Run Code Online (Sandbox Code Playgroud)