使用parseString时pyparsing.ParseException(searchString有效)

Ala*_*ira 5 python grammar nlp nltk pyparsing

我正在尝试使用pyparsing解析一些交通违规句子,当我使用grammar.searchString(sentence)它是可以的,但是当我使用parseStringa时ParseException 被抛出.任何人都可以帮我,请说我的代码出了什么问题?

from pyparsing import Or, Literal, oneOf, OneOrMore, nums, alphas, Regex, Word, \
    SkipTo, LineEnd, originalTextFor, Optional, ZeroOrMore, Keyword, Group
import pyparsing as pp

from nltk.tag import pos_tag

sentences = ['Failure to control vehicle speed on highway to avoid collision','Failure to stop at stop sign', 'Introducing additives into special fuel by unauthorized person and contrary to regulations', 'driver fail to stop at yield sign at nearest pointf approaching traffic view when req. for safety', 'Operating unregistered motor vehicle on highway', 'Exceeding maximum speed: 39 MPH in a posted 30 MPH zone']


for sentence in sentences:
    words = pos_tag(sentence.split())
    #print words
    verbs = [word for word, pos in words if pos in ['VB','VBD','VBG']]
    nouns = [word for word, pos in words if pos == 'NN']
    adjectives = [word for word, pos in words if pos == 'JJ']

    adjectives.append('great')  # initializing  
    verbs.append('get') # initializing 


    object_generator = oneOf('for to')
    location_generator = oneOf('at in into on onto over within')
    speed_generator = oneOf('MPH KM/H')

    noun = oneOf(nouns)
    adjective = oneOf(adjectives)

    location = location_generator + pp.Group(Optional(adjective) + noun)

    action = oneOf(verbs)
    speed = Word(nums) + speed_generator

    grammar =  action | location | speed

    parsed = grammar.parseString(sentence)

    print parsed
Run Code Online (Sandbox Code Playgroud)

错误回溯

回溯(最近一次调用最后一次):文件"script3.py",第35行,在parsed = grammar.parseString(sentence)文件"/Users/alana/anaconda/lib/python2.7/site-packages/pyparsing .py ",第1032行,在parseString中引发exc pyparsing.ParseException:Expected Re :('control | avoid | get')(在char 0处),(line:1,col:1)

Pau*_*McG 3

searchString之所以有效,是因为它会跳过与语法不完全匹配的文本。parseString更为特殊,需要完整的语法匹配,从输入字符串的第一个字符开始。在你的例子中,语法有点难以确定,因为它是根据输入句子的 NLTK 分析自动生成的(顺便说一句,这是一种有趣的方法)。如果您只打印语法本身,它可能会让您了解它正在寻找的字符串。例如,我猜测 NLTK 会将您的第一个示例中的“失败”解释为名词,但语法中的 3 个表达式都没有以名词开头 - 因此,parseString将会失败。

您可能需要根据 NLTK 找到的内容对名词、形容词和动词列表进行更多内部打印,然后查看它们如何映射到生成的语法。

您还可以尝试使用 Python 的内置 sum() 组合句子中多个匹配的结果:

grammar =  action("action") | Group(location)("location") | Group(speed)("speed")

#parsed = grammar.parseString(sentence)
parsed = sum(grammar.searchString(sentence))
print(parsed.dump())
Run Code Online (Sandbox Code Playgroud)