Ben*_*oit 132 windows parsing cmd batch-file variable-expansion
我遇到了ss64.com,它为如何编写Windows命令解释器将运行的批处理脚本提供了很好的帮助.
但是,我一直无法找到批处理脚本的语法,扩展或不扩展的方法,以及如何逃避事情的良好解释.
以下是我无法解决的示例问题:
foreach $i (@ARGV) { print '*' . $i ; }),编译它并以这种方式调用它:
my_script.exe "a ""b"" c"?输出是 *a "b*cmy_script.exe """a b c"""?输出它*"a*b*c"echo命令如何工作?在那个命令中扩展了什么?for [...] %%I在文件脚本中使用,但for [...] %I在交互式会话中?%PROCESSOR_ARCHITECTURE%字面回声?我发现echo.exe %""PROCESSOR_ARCHITECTURE%有效,有没有更好的解决方案?%配对?例:
set b=a,echo %a %b% c%?%a a c%set a =b,echo %a %b% c%?bb c%set命令时如何存储变量?例如,如果我这样做set a=a" b,那么echo.%a%我获得a" b.但是,如果我使用echo.exeUnxUtils,我会得到a b.怎么%a%以不同的方式扩展?谢谢你的灯.
jeb*_*jeb 179
I performed many experiments to investigate the grammar of batch scripts. I also investigated the differences between batch and command line mode.
Processing a line of code in a batch file involves multiple phases.
Here is a brief overview of the various phases:
Phase 0) Read Line:
Phase 1) Percent Expansion:
Phase 1.5) Remove <CR>: Remove all Carriage Return (0x0D) characters
Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes.
Phase 3) Echo the parsed command(s) Only if the command block did not begin with @, and ECHO was ON at the start of the preceding step.
阶段4)FOR %X变量扩展:仅当FOR命令处于活动状态且正在处理DO之后的命令时.
阶段5)延迟扩展:仅在启用延迟扩展时
阶段5.3)管道处理:仅当命令位于管道的任何一侧时
阶段5.5)执行重定向:
阶段6)CALL处理/插入符号加倍:仅当命令令牌为CALL时
阶段7)执行:执行命令
以下是每个阶段的详细信息:
请注意,下面描述的阶段只是批处理解析器工作方式的模型.实际的cmd.exe内部可能无法反映这些阶段.但是这种模型可以有效地预测批处理脚本的行为.
阶段0)读取线:读取输入线.
<LF> (0x1A) is read as <Ctrl-Z> (LineFeed 0x0A)<LF>, is treated as itself - it is not converted to <Ctrl-Z>Phase 1) Percent Expansion:
<LF> is replaced by a single %%%, %*, etc.)%1, if var does not exists replace it with nothingPhase 1.5) Remove %2: Remove all Carriage Returns (0x0D) from the line
Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes. What follows is an approximation of this process.
There are some concepts that are important throughout this phase.
%var% <LF> %var% <CR> <space> <tab> ; and ,The following characters may have special meaning in this phase, depending on context: = <0x0B> <0x0C> <0xFF> ^ ( @ & | < > <LF> <space> <tab> ; ,
Look at each character from left to right:
=), the next character is escaped, and the escaping caret is removed. Escaped characters lose all special meaning (except for <0x0B>).<0x0C>), toggle the quote flag. If the quote flag is active, then only <0xFF> and ^ are special. All other characters lose their special meaning until the next quote toggles the quote flag off. It is not possible to escape the closing quote. All quoted characters are always within the same token.<LF> always turns off the quote flag. Other behaviors vary depending on context, but quotes never alter the behavior of ".
"
<LF> is stripped<LF>, then it is treated as a literal, meaning this process is not recursive.<LF> not within parentheses
<LF> is stripped and parsing of the current line is terminated.<LF> within a FOR IN parenthesized block
<LF> is converted into a <LF><LF> within a parenthesized command block
<LF> is converted into <LF>, and the <space> is treated as part of the next line of the command block.<LF> <LF> <LF><space> or <space>, split the line at this point in order to handle pipes, command concatenation, and redirection.
&), each side is a separate command (or command block) that gets special handling in phase 5.3|, <, or > command concatenation, each side of the concatenation is treated as a separate command.|, &, &&, or || redirection, the redirection clause is parsed, temporarily removed, and then appended to the end of the current command. A redirection clause consists of an optional file handle digit, the redirection operator, and the redirection destination token.
<, then the << has special meaning. (> is not special in any other context)
>> is removed.@ is before an opening @, then the entire parenthesized block is excluded from the phase 3 echo.@ is not special.@, then start a new compound statement and increment the parenthesis counter@ terminates the compound statement and decrements the parenthesis counter.( functions similar to a ( statement as long as it is immediately followed by a token delimiter, special character, newline, or end-of-file
( (line concatenation is possible)) have been stripped and redirection moved to the end).
) functions as a command token delimiter, in addition to the standard token delimitersREM as ^. After the IN clause is parsed, all tokens are concatenated together to form a single token. @ that ends the line, then the argument token is thrown away, and the subsequent line is parsed and appended to the REM. This repeats until there is more than one token, or the last character is not (.<LF>, and this is the first round of phase 2 (not a restart due to CALL in phase 6) then
<space>, ^, ^, : and ) no longer have special meaning. The entire remainder of the line is considered to be part of the label "command".< continues to be special, meaning that line continuation can be used to append the subsequent line to the label.> no longer has special meaning for the first command that follows the Unexecuted Label in this context.& pipe or |, ^, or ( command concatenation on the line.Phase 3) Echo the parsed command(s) Only if the command block did not begin with |, and ECHO was ON at the start of the preceding step.
Phase 4) FOR & variable expansion: Only if a FOR command is active and the commands after DO are being processed.
&& into ||. The command line has different percent expansion rules for phase 1. This is the reason that command lines use @ but batch files use %X for FOR variables.%%X are not case sensitive.%X take precedence over variable names. If a character following %X is both a modifier and a valid FOR variable name, and there exists a subsequent character that is an active FOR variable name, then the character is interpreted as a modifier.---- From this point onward, each command identified in phase 2 is processed separately.
---- Phases 5 through 7 are completed for one command before moving on to the next.
Phase 5) Delayed Expansion: Only if delayed expansion is on
%%X. If not, then the token is not parsed - important for ~modifiers characters.
If the token does contain ~modifiers, then scan each character from left to right:
~) the next character has no special meaning, the caret itself is removed! are collapsed into a single ^! that cannot be paired is removed^ or !)Phase 5.3) Pipe processing: Only if commands are on either side of a pipe
Each side of the pipe is processed independently.
! with a command before and after are converted to !. Other <CR> are stripped.<LF>. This means the command block gets a phase restart, but this time in command line mode.Phase 5.5) Execute Redirection: Any redirection that was discovered in phase 2 is now executed.
%comspec% /S /D /c" commandBlock" is used.Phase 6) CALL processing/Caret doubling: Only if the command token is CALL, or if the text before the first occurring standard token delimiter is CALL. If CALL is parsed from a larger command token, then the unused portion is prepended to the arguments token bef
Mik*_*ark 60
从命令窗口调用命令时,命令行参数的标记化不是由cmd.exe(也称为"shell")完成的.大多数情况下,标记化是由新形成的进程的C/C++运行时完成的,但这不一定是这样 - 例如,如果新进程不是用C/C++编写的,或者新进程选择忽略argv和处理自己的原始命令行(例如,使用GetCommandLine()).在操作系统级别,Windows将未命名的命令行作为单个字符串传递给新进程.这与大多数*nix shell形成对比,其中shell在将参数传递给新形成的进程之前以一致,可预测的方式对参数进行标记.所有这些意味着您可能会在Windows上的不同程序中遇到极为不同的参数标记化行为,因为单个程序通常会将参数标记化放在自己手中.
如果它听起来像无政府状态,那就是它.但是,由于大量Windows程序确实使用了Microsoft C/C++运行时argv,因此了解MSVCRT如何标记参数通常很有用.这是一段摘录:
Microsoft"批处理语言"(.bat)也不例外,它已经开发了自己独特的标记化和转义规则.在将参数传递给新执行的进程之前,它看起来像cmd.exe的命令提示符确实对命令行参数进行了一些预处理(主要用于变量替换和转义).您可以在本页的jeb和dbenham的优秀答案中阅读有关批处理语言和cmd转义的低级详细信息的更多信息.
让我们在C中构建一个简单的命令行实用程序,看看它对你的测试用例的描述:
int main(int argc, char* argv[]) {
int i;
for (i = 0; i < argc; i++) {
printf("argv[%d][%s]\n", i, argv[i]);
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
(注意:argv [0]始终是可执行文件的名称,为简洁起见,在下面省略.在Windows XP SP3上测试.使用Visual Studio 2005编译.)
> test.exe "a ""b"" c"
argv[1][a "b" c]
> test.exe """a b c"""
argv[1]["a b c"]
> test.exe "a"" b c
argv[1][a" b c]
Run Code Online (Sandbox Code Playgroud)
还有一些我自己的测试:
> test.exe a "b" c
argv[1][a]
argv[2][b]
argv[3][c]
> test.exe a "b c" "d e
argv[1][a]
argv[2][b c]
argv[3][d e]
> test.exe a \"b\" c
argv[1][a]
argv[2]["b"]
argv[3][c]
Run Code Online (Sandbox Code Playgroud)
dbe*_*ham 44
以下是jeb答案中阶段1的扩展说明(对于批处理模式和命令行模式均有效).
阶段1)扩展百分比
从左开始,扫描每个字符%.如果发现那么
<LF>) 如果命令行模式跳过
<LF>那么<LF>单<LF>,并继续扫描<CR>并且启用了命令扩展,则%所有命令行参数的文本替换(如果没有参数则替换为空)并继续扫描. %再跟随参数值替换(如果未定义,则%为空)并继续扫描.%%则启用命令扩展
%则*为修改后的参数值(如果未定义或未指定$ PATH:modifier,则替换为空)并继续扫描.%*<digit>或之前打破%<digit>,并将它们称为VAR(可能是一个空列表)
~那么
<digit>为VAR值并继续扫描%~[modifiers]<digit>并继续扫描 <digit> %或之前打破%,并将其称为VAR(可能是一个空列表).如果VAR之前中断%VAR%,则后续字符%VAR%包含%在VAR中的最后一个字符并且之前中断:.
:那么
%为VAR值并继续扫描 :并继续扫描 %那么
%并继续扫描. %VAR%那么
%VAR%然后:为VAR值的子字符串(可能导致空字符串)并继续扫描.%VAR:还是~那么[integer][,[integer]]%,其中搜索可以包括除%VAR:~[integer][,[integer]]%和之外的任何字符集=,并且替换可以包括除了*=和之外的任何字符集[*]search=[replace]%,然后=在执行搜索和替换后替换为VAR的值(可能导致空字符串)并继续扫描%并继续扫描%VAR:[*]search=[replace]%并继续扫描以上有助于解释为什么这一批
@echo off
setlocal enableDelayedExpansion
set "1var=varA"
set "~f1var=varB"
call :test "arg1"
exit /b
::
:test "arg1"
echo %%1var%% = %1var%
echo ^^^!1var^^^! = !1var!
echo --------
echo %%~f1var%% = %~f1var%
echo ^^^!~f1var^^^! = !~f1var!
exit /b
Run Code Online (Sandbox Code Playgroud)
给出这些结果:
%1var% = "arg1"var
!1var! = varA
--------
%~f1var% = P:\arg1var
!~f1var! = varB
Run Code Online (Sandbox Code Playgroud)
注1 - 第1阶段发生在识别REM声明之前.这非常重要,因为这意味着如果注释具有无效的参数扩展语法或无效的变量搜索和替换语法,则即使注释也会产生致命错误!
@echo off
rem %~x This generates a fatal argument expansion error
echo this line is never reached
Run Code Online (Sandbox Code Playgroud)
注2 - %解析规则的另一个有趣结果:可以定义包含在名称中的变量,但除非禁用命令扩展,否则无法扩展它们.有一个例外 - 在启用命令扩展时,可以扩展末尾包含单个冒号的变量名.但是,您不能对以冒号结尾的变量名执行子字符串或搜索和替换操作.下面的批处理文件(由jeb提供)演示了这种行为
@echo off
setlocal
set var=content
set var:=Special
set var::=double colon
set var:~0,2=tricky
set var::~0,2=unfortunate
echo %var%
echo %var:%
echo %var::%
echo %var:~0,2%
echo %var::~0,2%
echo Now with DisableExtensions
setlocal DisableExtensions
echo %var%
echo %var:%
echo %var::%
echo %var:~0,2%
echo %var::~0,2%
Run Code Online (Sandbox Code Playgroud)
注3 - jeb在其帖子中列出的解析规则顺序的一个有趣的结果:当执行搜索并用正常扩展替换时,不应该转义特殊字符(尽管它们可能被引用).但是当执行搜索并用延迟扩展替换时,必须转义特殊字符(除非它们被引用).
@echo off
setlocal enableDelayedExpansion
set "var=this & that"
echo %var:&=and%
echo "%var:&=and%"
echo !var:^&=and!
echo "!var:&=and!"
Run Code Online (Sandbox Code Playgroud)
以下是对jeb答案中第5阶段的扩展且更准确的解释(对批处理模式和命令行模式都有效)
请注意,有些边缘情况会导致这些规则失败:
请参阅使用CALL检查换行符
阶段5)延迟扩展仅当启用了延迟扩展,并且该行至少包含一个%,然后从左开始,扫描每个字符%或%,如果找到,则
%或CALL文字
&,然后
&&||,那么
|或之前打破for ... in(TOKEN) do,并将它们称为VAR(可能是一个空列表)
if defined TOKEN那么
if exists TOKENVAR的值替换并继续扫描if errorlevel TOKEN并继续扫描if cmdextversion TOKEN,if TOKEN comparison TOKEN或==,并呼吁他们VAR(可能是一个空的列表).如果VAR之前中断equ,则后续字符neq包含lss在VAR中的最后一个字符并且之前中断leq
gtr那么
geqVAR的值替换并继续扫描!并继续扫描 !那么
^并继续扫描!然后!为VAR值的子字符串(可能导致空字符串)并继续扫描^,其中,搜索可以包括任何一组,除了字符^和^,和替换可以包括任何一组,除了字符!和!,然后<LF>执行搜索之后与VAR的值和替换(可能导致在一个空串)并继续扫描!!VAR!!VAR!正如所指出的,命令在μSoftland中传递整个参数字符串,由它们将它解析为单独的参数供自己使用.在不同的程序之间没有任何一致性,因此没有一套规则来描述这个过程.你真的需要检查你的程序使用的任何C库的每个角落案例.
就系统.bat文件而言,这是测试:
c> type args.cmd
@echo off
echo cmdcmdline:[%cmdcmdline%]
echo 0:[%0]
echo *:[%*]
set allargs=%*
if not defined allargs goto :eof
setlocal
@rem Wot about a nice for loop?
@rem Then we are in the land of delayedexpansion, !n!, call, etc.
@rem Plays havoc with args like %t%, a"b etc. ugh!
set n=1
:loop
echo %n%:[%1]
set /a n+=1
shift
set param=%1
if defined param goto :loop
endlocal
Run Code Online (Sandbox Code Playgroud)
现在我们可以运行一些测试.看看你是否可以弄清楚μSoft正在尝试做什么:
C>args a b c
cmdcmdline:[cmd.exe ]
0:[args]
*:[a b c]
1:[a]
2:[b]
3:[c]
Run Code Online (Sandbox Code Playgroud)
好到目前为止.(我就离开了无趣%cmdcmdline%,并%0从现在开始.)
C>args *.*
*:[*.*]
1:[*.*]
Run Code Online (Sandbox Code Playgroud)
没有文件名扩展.
C>args "a b" c
*:["a b" c]
1:["a b"]
2:[c]
Run Code Online (Sandbox Code Playgroud)
没有引用剥离,虽然引号确实阻止了参数拆分.
c>args ""a b" c
*:[""a b" c]
1:[""a]
2:[b" c]
Run Code Online (Sandbox Code Playgroud)
连续的双引号会导致它们失去任何特殊的解析能力.@Beniot的例子:
C>args "a """ b "" c"""
*:["a """ b "" c"""]
1:["a """]
2:[b]
3:[""]
4:[c"""]
Run Code Online (Sandbox Code Playgroud)
测验:如何将任何环境var的值作为单个参数(即as %1)传递给bat文件?
c>set t=a "b c
c>set t
t=a "b c
c>args %t%
1:[a]
2:["b c]
c>args "%t%"
1:["a "b]
2:[c"]
c>Aaaaaargh!
Run Code Online (Sandbox Code Playgroud)
Sane解析似乎永远破碎了.
为了您的娱乐,尝试添加杂^,\,',&(下略)字符这些例子.
你已经有了一些很好的答案,但要回答你问题的一部分:
set a =b, echo %a %b% c% ? bb c%
Run Code Online (Sandbox Code Playgroud)
发生的事情是因为你在=之前有一个空格,%a<space>%
所以当你echo %a %被正确计算为时,会创建一个名为的变量b.
b% c%然后将剩余部分作为纯文本+未定义变量进行评估% c%,该变量应作为类型回显,以便echo %a %b% c%返回bb% c%
我怀疑在变量名中包含空格的能力比计划的"特征"更具有疏忽性
| 归档时间: |
|
| 查看次数: |
60639 次 |
| 最近记录: |