mik*_*bal 4 parsing infix-notation postfix-notation shunting-yard
给定这样的输入:3+4+
算法将其转换为3 4 + +
我可以在执行后缀表达式时找到错误.但是,有可能在转换期间发现这一点吗?
(我读过的维基百科文章和网络文章不处理这种情况)
谢谢
ric*_*ici 17
除括号不匹配外,可以使用正则表达式验证有效表达式.(不匹配的括号将被维基百科页面中指示的分流码算法捕获,所以我忽略了这些.)
正则表达式如下:
PRE* OP POST* (INF PRE* OP POST*)*
Run Code Online (Sandbox Code Playgroud)
哪里:
PRE is a prefix operator or (
POST is a postfix operator or )
INF is an infix operator
OP is an operand (a literal or a variable name)
Run Code Online (Sandbox Code Playgroud)
您链接的维基百科页面包括函数调用,但不包括数组下标; 如果你想处理这些情况,那么:
PRE is a prefix operator or (
POST is a postfix operator or ) or ]
INF is an infix operator or ( or ,
OP is an operand, including function names
Run Code Online (Sandbox Code Playgroud)
在上面要注意的一点是,PRE并且INF处于不相关的背景中; 也就是说,如果PRE有效,则INF不是,反之亦然.这意味着对前缀运算符和中缀运算符使用相同的符号是明确的.此外,PRE并且POST处于不相交的上下文中,因此您可以对前缀和后缀运算符使用相同的符号.但是,您不能对postfix和中缀运算符使用相同的符号.例如,考虑C/C++运算符:
- prefix or infix
+ prefix or infix
-- prefix or postfix
++ prefix or postfix
Run Code Online (Sandbox Code Playgroud)
我隐含地使用上面的这个特性来允许(用作表达式分组器(有效前缀)和函数调用(中缀,因为它介于函数名和参数列表之间;运算符是"call".)
将常规表达式实现为状态机是最常见的,因为只有两种状态:
+-----+ +-----+
|State|-----OP---->|State|
| 1 |<----INF----| 2 |
| |---+ | |---+
+-----+ | +-----+ |
^ PRE ^ POST
| | | |
+------+ +------+
Run Code Online (Sandbox Code Playgroud)
我们可以调用状态1"想要操作数"和状态2"有操作数".一个简单的实现只是将维基百科中提供的分流码算法分成两个循环,就像这样(如果你不喜欢goto,它可以被消除,但它确实是呈现状态机的最清晰的方式).
want_operand:
read a token. If there are no more tokens, announce an error.
if the token is an prefix operator or an '(':
mark it as prefix and push it onto the operator stack
goto want_operand
if the token is an operand (identifier or variable):
add it to the output queue
goto have_operand
if the token is anything else, announce an error and stop. (**)
have_operand:
read a token
if there are no more tokens:
pop all operators off the stack, adding each one to the output queue.
if a `(` is found on the stack, announce an error and stop.
if the token is a postfix operator:
mark it as postfix and add it to the output queue
goto have_operand.
if the token is a ')':
while the top of the stack is not '(':
pop an operator off the stack and add it to the output queue
if the stack becomes empty, announce an error and stop.
if the '(' is marked infix, add a "call" operator to the output queue (*)
pop the '(' off the top of the stack
goto have_operand
if the token is a ',':
while the top of the stack is not '(':
pop an operator off the stack and add it to the output queue
if the stack becomes empty, announce an error
goto want_operand
if the token is an infix operator:
(see the wikipeda entry for precedence handling)
(normally, all prefix operators are considered to have higher precedence
than infix operators.)
got to want_operand
otherwise, token is an operand. Announce an error
(*) The procedure as described above does not deal gracefully with parameter lists;
when the postfix expression is being evaluated and a "call" operator is found, it's
not clear how many arguments need to be evaluated. It might be that function names
are clearly identifiable, so that the evaluator can just attach arguments to the
"call" until a function name is found. But a cleaner solution is for the "," handler
to increment the argument count of the "call" operator (that is, the open
parenthesis marked as "infix"). The ")" handler also needs to increment the
argument count.
(**) The state machine as presented does not correctly handle function calls with 0
arguments. This call will show up as a ")" read in the want_operand state and with
a "call" operator on top of the stack. If the "call" operator is marked with an
argument count, as above, then the argument count must be 0 when the ")" is read,
and in this case, unlike the have_operand case, the argument count must not be
incremented.
Run Code Online (Sandbox Code Playgroud)