使用pyparsing解析单词escape-split over multiple lines

got*_*nes 5 python parsing pyparsing

我正在尝试\\n使用pyparsing解析可以使用反斜杠 - 换行符组合(" ")在多行中分解的单词.这就是我所做的:

from pyparsing import *

continued_ending = Literal('\\') + lineEnd
word = Word(alphas)
split_word = word + Suppress(continued_ending)
multi_line_word = Forward()
multi_line_word << (word | (split_word + multi_line_word))

print multi_line_word.parseString(
'''super\\
cali\\
fragi\\
listic''')
Run Code Online (Sandbox Code Playgroud)

我得到的输出是['super'],而预期的输出是['super', 'cali', fragi', 'listic'].更好的是他们所有人都加入了一个词(我想我可以这样做)multi_line_word.parseAction(lambda t: ''.join(t)).

我尝试在pyparsing helper中查看此代码,但它给了我一个错误,maximum recursion depth exceeded.

编辑2009-11-15:后来我意识到pyparsing在空白方面有点慷慨,这导致一些不好的假设,我认为我正在解析的是松散的.也就是说,我们希望在单词的任何部分,转义和EOL字符之间看不到空格.

我意识到上面的小例子字符串不足以作为测试用例,所以我编写了以下单元测试.通过这些测试的代码应该能够匹配我直观地认为是一个逃避分裂词 - 而且只是一个逃避分裂词.它们不匹配不是转义拆分的基本单词.我们可以 - 我相信应该 - 使用不同的语法结构.这使两者分开保持整洁.

import unittest
import pyparsing

# Assumes you named your module 'multiline.py'
import multiline

class MultiLineTests(unittest.TestCase):

    def test_continued_ending(self):

        case = '\\\n'
        expected = ['\\', '\n']
        result = multiline.continued_ending.parseString(case).asList()
        self.assertEqual(result, expected)


    def test_continued_ending_space_between_parse_error(self):

        case = '\\ \n'
        self.assertRaises(
            pyparsing.ParseException,
            multiline.continued_ending.parseString,
            case
        )


    def test_split_word(self):

        cases = ('shiny\\', 'shiny\\\n', ' shiny\\')
        expected = ['shiny']
        for case in cases:
            result = multiline.split_word.parseString(case).asList()
            self.assertEqual(result, expected)


    def test_split_word_no_escape_parse_error(self):

        case = 'shiny'
        self.assertRaises(
            pyparsing.ParseException,
            multiline.split_word.parseString,
            case
        )


    def test_split_word_space_parse_error(self):

        cases = ('shiny \\', 'shiny\r\\', 'shiny\t\\', 'shiny\\ ')
        for case in cases:
            self.assertRaises(
                pyparsing.ParseException,
                multiline.split_word.parseString,
                case
            )


    def test_multi_line_word(self):

        cases = (
                'shiny\\',
                'shi\\\nny',
                'sh\\\ni\\\nny\\\n',
                ' shi\\\nny\\',
                'shi\\\nny '
                'shi\\\nny captain'
        )
        expected = ['shiny']
        for case in cases:
            result = multiline.multi_line_word.parseString(case).asList()
            self.assertEqual(result, expected)


    def test_multi_line_word_spaces_parse_error(self):

        cases = (
                'shi \\\nny',
                'shi\\ \nny',
                'sh\\\n iny',
                'shi\\\n\tny',
        )
        for case in cases:
            self.assertRaises(
                pyparsing.ParseException,
                multiline.multi_line_word.parseString,
                case
            )


if __name__ == '__main__':
    unittest.main()
Run Code Online (Sandbox Code Playgroud)

got*_*nes 5

在探索了一下之后,我找到了这个帮助线程,其中有一个显着的位

当有人直接从BNF定义实现pyparsing语法时,我经常会看到效率低下的语法.BNF没有"一个或多个"或"零个或多个"或"可选"的概念......

有了这个,我有了改变这两条线的想法

multi_line_word = Forward()
multi_line_word << (word | (split_word + multi_line_word))
Run Code Online (Sandbox Code Playgroud)

multi_line_word = ZeroOrMore(split_word) + word
Run Code Online (Sandbox Code Playgroud)

这让它输出我想要的东西:['super', 'cali', fragi', 'listic'].

接下来,我添加了一个将这些标记连接在一起的解析操作:

multi_line_word.setParseAction(lambda t: ''.join(t))
Run Code Online (Sandbox Code Playgroud)

这给出了最终输出['supercalifragilistic'].

我学到的带回家的信息是,人们并不是简单地走进魔多.

开玩笑.

带回家的消息是,人们不能简单地通过pyparsing实现BNF的一对一翻译.使用迭代类型的一些技巧应该被使用.

编辑2009-11-25:为了弥补更费力的测试用例,我将代码修改为以下内容:

no_space = NotAny(White(' \t\r'))
# make sure that the EOL immediately follows the escape backslash
continued_ending = Literal('\\') + no_space + lineEnd
word = Word(alphas)
# make sure that the escape backslash immediately follows the word
split_word = word + NotAny(White()) + Suppress(continued_ending)
multi_line_word = OneOrMore(split_word + NotAny(White())) + Optional(word)
multi_line_word.setParseAction(lambda t: ''.join(t))
Run Code Online (Sandbox Code Playgroud)

这样做的好处是确保任何元素之间没有空间(除了转义反斜杠后的换行符).


Pau*_*McG 5

你的代码非常接近.任何这些mod都可以工作:

# '|' means MatchFirst, so you had a left-recursive expression
# reversing the order of the alternatives makes this work
multi_line_word << ((split_word + multi_line_word) | word)

# '^' means Or/MatchLongest, but beware using this inside a Forward
multi_line_word << (word ^ (split_word + multi_line_word))

# an unusual use of delimitedList, but it works
multi_line_word = delimitedList(word, continued_ending)

# in place of your parse action, you can wrap in a Combine
multi_line_word = Combine(delimitedList(word, continued_ending))
Run Code Online (Sandbox Code Playgroud)

正如你在pyparsing谷歌搜索中发现的那样,BNF-> pyparsing翻译应该特别考虑使用pyparsing功能代替BNF,嗯,缺点.我实际上正在撰写更长的答案,进入更多的BNF翻译问题,但你已经找到了这个材料(在维基上,我假设).