mar*_*gti 7 url sed command-line parsing
我正在尝试使用 sed 来提取 URL 查询字符串中许多键值对之一的值部分
这就是我正在尝试的:
echo 'http://www.youtube.com/watch?v=abc&g=xyz' | sed 's@^https?://(www.)?youtube.com/(watch\\?)?.*?v(=|/)([a-zA-Z0-9\-_]*)(&.*)?$@$4@'
Run Code Online (Sandbox Code Playgroud)
但它总是按原样输出输入 URL。
我究竟做错了什么?
更新 1
澄清一些问题:
sed我正在使用的版本,但它是 Mac OS X (10.7.5) 附带的版本。sed$1、$2 等版本中,似乎是匹配项,\1、\2 等给出错误:
sed: 1: "s@^https?://(www.)?yout ...": \4 not defined in the RE 不正确!正如我后来发现的那样。造成混乱的道歉。更新 2
Have updated the sed RE to make it more specific based on suggestion by @slhck below, but the issue remains as before.
Update 3
Based on the man page for this version of sed it appears that this is a BSD-flavoured version.
ter*_*don 12
Even simpler, if you just want the abc:
echo 'http://www.youtube.com/watch?v=abc&g=xyz' | awk -F'[=&]' '{print $2}'
Run Code Online (Sandbox Code Playgroud)
If you want the xyz :
echo 'http://www.youtube.com/watch?v=abc&g=xyz' | awk -F'[=&]' '{print $4}'
Run Code Online (Sandbox Code Playgroud)
EXPLANATION:
awk : is a scripting language that automatically processes input files line by line, splitting each line into fields. So, when you process a file with awk, for each line, the first field is $1, the second $2 etc up to $N. By default awk uses blanks as the field separator.
-F'[=&]' : -F is used to change the field delimiter from spaces to something else. In this case, I am giving it a class of characters. Square brackets ([ ]) are used by many languages to denote groups of characters. So, specifically, -F'[=&]' means that awk should use both & and = as field delimiters.
Therefore, given the input string from your question, using & and = as delimiters, awk will read the following fields:
http://www.youtube.com/watch?v=abc&g=xyz
|----------- $1 -------------| --- - ---
| | |
| | ?----- $4
| -------- $3
----------- $2
Run Code Online (Sandbox Code Playgroud)
So, all you need to do is print whichever one you want {print $4}.
You said you also want to check that the string is a valid youtube URL, you can't do that with sed since if it does not match the regex you give it, it will simply print the entire line. You can use a tool like Perl to only print if the regex matches:
echo 'http://www.youtube.com/watch?v=abc&g=xyz' |
perl -ne 's/http.*www.youtube.com\/watch\?v=(.+?)&.+/$1/ && print'
Run Code Online (Sandbox Code Playgroud)
Finally, to simply print abc you can use the standard UNIX tool cut:
echo 'http://www.youtube.com/watch?v=abc&g=xyz' |
cut -d '=' -f 2 | cut -d '&' -f 1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
38760 次 |
| 最近记录: |