rei*_*eer 1 python compiler-construction grammar parsing ply
我对PLY 中的setlx 语言有以下语法:
Rule 0 S' -> file_input
Rule 1 file_input -> statement_list
Rule 2 epsilon -> <empty>
Rule 3 statement_list -> statement
Rule 4 statement_list -> statement_list statement
Rule 5 statement -> simple_statement SEMICOLON
Rule 6 statement -> compound_statement
Rule 7 simple_statement -> expression_statement
Rule 8 simple_statement -> assert_statement
Rule 9 simple_statement -> assignment_statement
Rule 10 simple_statement -> augmented_assign_statement
Rule 11 simple_statement -> backtrack_statement
Rule 12 simple_statement -> break_statement
Rule 13 simple_statement -> continue_statement
Rule 14 simple_statement -> exit_statement
Rule 15 simple_statement -> return_statement
Rule 16 simple_statement -> quantor
Rule 17 simple_statement -> term
Rule 18 expression_statement -> expression
Rule 19 backtrack_statement -> BACKTRACK
Rule 20 break_statement -> BREAK
Rule 21 continue_statement -> CONTINUE
Rule 22 exit_statement -> EXIT
Rule 23 return_statement -> RETURN
Rule 24 return_statement -> RETURN expression
Rule 25 expression_list -> expression
Rule 26 expression_list -> expression_list COMMA expression
Rule 27 expression -> implication
Rule 28 expression -> lambda_definition
Rule 29 expression -> implication EQUIVALENT implication
Rule 30 expression -> implication ANTIVALENT implication
Rule 31 implication -> disjunction
Rule 32 implication -> disjunction IMPLICATES disjunction
Rule 33 disjunction -> conjunction
Rule 34 disjunction -> disjunction OR conjunction
Rule 35 conjunction -> comparison
Rule 36 conjunction -> conjunction AND comparison
Rule 37 comparison -> sum
Rule 38 comparison -> sum EQ sum
Rule 39 comparison -> sum NEQ sum
Rule 40 comparison -> sum LT sum
Rule 41 comparison -> sum LE sum
Rule 42 comparison -> sum GT sum
Rule 43 comparison -> sum GE sum
Rule 44 comparison -> sum IN sum
Rule 45 comparison -> sum NOTIN sum
Rule 46 sum -> product
Rule 47 sum -> sum PLUS product
Rule 48 sum -> sum MINUS product
Rule 49 product -> reduce
Rule 50 product -> product TIMES reduce
Rule 51 product -> product DIVIDE reduce
Rule 52 product -> product IDIVIDE reduce
Rule 53 product -> product MOD reduce
Rule 54 product -> product CARTESIAN reduce
Rule 55 reduce -> unary_expression
Rule 56 reduce -> reduce SUM unary_expression
Rule 57 reduce -> reduce PRODUCT unary_expression
Rule 58 unary_expression -> power
Rule 59 unary_expression -> SUM unary_expression
Rule 60 unary_expression -> PRODUCT unary_expression
Rule 61 unary_expression -> HASH unary_expression
Rule 62 unary_expression -> MINUS unary_expression
Rule 63 unary_expression -> AT unary_expression
Rule 64 unary_expression -> BANG unary_expression
Rule 65 power -> primary
Rule 66 power -> primary POW unary_expression
Rule 67 primary -> atom
Rule 68 primary -> attributeref
Rule 69 primary -> subscription
Rule 70 primary -> slicing
Rule 71 primary -> procedure
Rule 72 primary -> call
Rule 73 primary -> primary BANG
Rule 74 atom -> identifier
Rule 75 atom -> literal
Rule 76 atom -> enclosure
Rule 77 identifier -> IDENTIFIER
Rule 78 identifier -> UNUSED
Rule 79 attributeref -> primary DOT identifier
Rule 80 subscription -> primary LBRACKET expression RBRACKET
Rule 81 slicing -> primary LBRACKET lower_bound RANGE upper_bound RBRACKET
Rule 82 lower_bound -> expression
Rule 83 lower_bound -> epsilon
Rule 84 upper_bound -> expression
Rule 85 upper_bound -> epsilon
Rule 86 literal -> stringliteral
Rule 87 literal -> integer
Rule 88 literal -> floatnumber
Rule 89 literal -> boolean
Rule 90 stringliteral -> STRING
Rule 91 stringliteral -> LITERAL
Rule 92 integer -> INTEGER
Rule 93 floatnumber -> DOUBLE
Rule 94 boolean -> TRUE
Rule 95 boolean -> FALSE
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 98 enclosure -> list_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 100 set_display -> LBRACE expression RANGE expression RBRACE
Rule 101 set_display -> LBRACE expression COMMA expression RANGE expression RBRACE
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 103 list_display -> LBRACKET expression RANGE expression RBRACKET
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
Rule 106 lambda_definition -> lambda_parameters LAMBDADEF expression
Rule 107 lambda_parameters -> identifier
Rule 108 lambda_parameters -> LT parameter_list GT
Rule 109 assignment_statement -> target ASSIGN expression
Rule 110 target -> expression
Rule 111 augmented_assign_statement -> augtarget augop expression
Rule 112 augtarget -> identifier
Rule 113 augtarget -> attributeref
Rule 114 augtarget -> subscription
Rule 115 augop -> PLUS_EQUAL
Rule 116 augop -> MINUS_EQUAL
Rule 117 augop -> TIMES_EQUAL
Rule 118 augop -> DIVIDE_EQUAL
Rule 119 augop -> IDIVIDE_EQUAL
Rule 120 augop -> MOD_EQUAL
Rule 121 assert_statement -> ASSERT LPAREN expression COMMA expression RPAREN
Rule 122 term -> TERM LPAREN term_arguments RPAREN
Rule 123 term_arguments -> expression_list
Rule 124 term_arguments -> epsilon
Rule 125 procedure -> PROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 126 procedure -> CPROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 127 parameter_list -> procedure_param
Rule 128 parameter_list -> parameter_list COMMA procedure_param
Rule 129 parameter_list -> epsilon
Rule 130 procedure_param -> identifier
Rule 131 call -> primary LPAREN argument_list RPAREN
Rule 132 call -> primary LPAREN RPAREN
Rule 133 argument_list -> expression
Rule 134 argument_list -> argument_list COMMA expression
Rule 135 quantor -> FORALL LPAREN iterator_chain PIPE expression RPAREN
Rule 136 quantor -> EXISTS LPAREN iterator_chain PIPE expression RPAREN
Rule 137 iterator -> target IN expression
Rule 138 iterator_chain -> iterator
Rule 139 iterator_chain -> iterator_chain COMMA iterator
Rule 140 compound_statement -> if_statement
Rule 141 compound_statement -> switch_statement
Rule 142 compound_statement -> match_statement
Rule 143 compound_statement -> while_loop
Rule 144 compound_statement -> do_while_loop
Rule 145 compound_statement -> for_loop
Rule 146 block -> statement_list
Rule 147 block -> epsilon
Rule 148 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE
Rule 149 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE LBRACE block RBRACE
Rule 150 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE if_statement
Rule 151 switch_statement -> SWITCH LBRACE case_statements default_case RBRACE
Rule 152 case_statements -> case_list
Rule 153 case_statements -> epsilon
Rule 154 case_list -> case_statement
Rule 155 case_list -> case_list case_statement
Rule 156 case_statement -> CASE expression COLON block
Rule 157 default_case -> DEFAULT COLON block
Rule 158 default_case -> epsilon
Rule 159 match_statement -> MATCH
Rule 160 while_loop -> WHILE LPAREN expression RPAREN LBRACE block RBRACE
Rule 161 do_while_loop -> DO LBRACE block RBRACE WHILE LPAREN expression RPAREN SEMICOLON
Rule 162 for_loop -> FOR LPAREN iterator_chain RPAREN LBRACE block RBRACE
Run Code Online (Sandbox Code Playgroud)
在最后几米,我现在遇到了一些冲突:
WARNING:
WARNING: Conflicts:
WARNING:
WARNING: shift/reduce conflict for IN in state 34 resolved as shift
WARNING: shift/reduce conflict for COMMA in state 94 resolved as shift
WARNING: shift/reduce conflict for RPAREN in state 154 resolved as shift
Run Code Online (Sandbox Code Playgroud)
如何在不产生新冲突的情况下解决它们?我知道它们来自哪里,但我不知道如何修复它。任何帮助或一般建议都适用。
我会倒着做这些,因为那样我们会从最简单到最难。事实上,对于第一次冲突,我真的没有解决方案。
第三个冲突是语法中实际含糊不清的结果。你需要摆脱歧义:
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 133 argument_list -> expression
Run Code Online (Sandbox Code Playgroud)
因此,如果我们正在寻找 anenclosure
并且我们找到了一个简单的括号表达式,它可以是 aparenth_form
或者它可以是set_display
包含argument_list
恰好一个表达式的an 。我怀疑这里的意图是一个简单的括号表达式将是 a parenth_form
,但无法从语法中看出。
最简单的解决方案是parenth_form
完全摆脱,并argument_list
在为set_display
对应于规则 102的 AST 节点构建 AST 节点时检查单元素的情况。另一种可能性是明确说明它;更改规则 102 以要求set_display
至少有两个表达式:
set_display -> LPAREN expression COMMA argument_list RPAREN
Run Code Online (Sandbox Code Playgroud)
这仍然需要你来玩弄的AST,不过,因为你必须预先考虑expression
到argument_list
,当你建造的set_display
节点。
第二个 S/R 冲突实际上非常相似。它的产生是因为:
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
Run Code Online (Sandbox Code Playgroud)
所以:
LBRACKET expression COMMA expression ...
Run Code Online (Sandbox Code Playgroud)
如果以下符号是RANGE
,则需要根据规则 104 减少;根据规则 105 如果以下符号是RBRACKET
; 如果以下符号是 ,则根据规则 134 COMMA
。(这是一个粗略的近似值,因为它假设我们已经知道第二个 的结尾expression
。)但是,正如所写的那样,语法需要在看到第一个路径后立即提交到这些路径之一COMMA
,因为它需要在那个时候做出决定时刻是否创建一个argument_list
。
解决方案是延迟解析器的决定,这很容易但很难看:
list_display -> LBRACKET expression RANGE expression RBRACKET
list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
list_display -> LBRACKET expression RBRACKET
list_display -> LBRACKET expression COMMA argument_list RBRACKET
Run Code Online (Sandbox Code Playgroud)
现在,第一个COMMA
总是被转移,关于list_display
减少什么类型的决定被推迟到第二个结束expression
(如果有两个expression
s),但是有必要为最后两个产品处理 AST 以纠正argument_list
.
第一个 S/R 冲突的出现是因为IN
它既用作运算符又用作 的语法部分iterator
:
Rule 44 comparison -> sum IN sum
Rule 137 iterator -> target IN expression
Run Code Online (Sandbox Code Playgroud)
但是因为target
它只是一个expression
,并且expression
可以派生sum
,所以解析器不可能(大部分时间)知道IN
它在解析的很晚之后才知道它正在查看哪个。
先前的推迟决定的技术在这里不起作用,因为您需要知道IN
您正在查看哪种类型才能正确应用运算符优先级。假设我们在需要 an 的上下文中iterator
,输入是:
atom1 AND atom2 IN atom3
Run Code Online (Sandbox Code Playgroud)
如果那是迭代器(即下一个符号是COMMA
或RPAREN
),那么实际上是:
( atom1 AND atom2 ) IN atom3
Run Code Online (Sandbox Code Playgroud)
但是,如果这是迭代器的左侧,则需要以完全不同的方式对其进行解析:
( atom1 AND ( atom2 IN atom3 ) ) IN expression
Run Code Online (Sandbox Code Playgroud)
此外,atom3
可能是一个任意表达式,可能atom3 AND atom4
导致两个解析:
( atom1 AND atom2 ) IN ( atom3 AND atom4 )
( atom1 AND ( atom2 IN atom3 ) AND atom4 ) IN expression
Run Code Online (Sandbox Code Playgroud)
这就是为什么双关语在语言设计中很糟糕的原因。
我强烈怀疑没有LR(k)
语法可以解析您语言的特定角落,尽管这只是基于直觉;我没有证据。然而,GLR 解析器不会有问题,因为它实际上并不模糊。不知道Python中有没有GLR解析器生成器;如果你不依赖 Python,你当然可以使用bison
.
GLR 解析器也可以解决第二个冲突,这也不是歧义的结果。
归档时间: |
|
查看次数: |
3246 次 |
最近记录: |