sed:在 URL 查询字符串中提取键值对的值

mar*_*gti 7 url sed command-line parsing

我正在尝试使用 sed 来提取 URL 查询字符串中许多键值对之一的值部分

这就是我正在尝试的:

echo 'http://www.youtube.com/watch?v=abc&g=xyz' | sed 's@^https?://(www.)?youtube.com/(watch\\?)?.*?v(=|/)([a-zA-Z0-9\-_]*)(&.*)?$@$4@'
Run Code Online (Sandbox Code Playgroud)

但它总是按原样输出输入 URL。

我究竟做错了什么?

更新 1

澄清一些问题:

  1. 正则表达式比它必须的更复杂,因为我还试图检查输入的有效性并仅在输入有效时生成输出。所以更严格的匹配。
  2. 所需的输出是查询字符串中键“v”的值。
  3. 一直无法找到sed我正在使用的版本,但它是 Mac OS X (10.7.5) 附带的版本。
  4. 在我的sed$1、$2 等版本中,似乎是匹配项,\1、\2 等给出错误: sed: 1: "s@^https?://(www.)?yout ...": \4 not defined in the RE 不正确!正如我后来发现的那样。造成混乱的道歉。

更新 2

Have updated the sed RE to make it more specific based on suggestion by @slhck below, but the issue remains as before.

Update 3

Based on the man page for this version of sed it appears that this is a BSD-flavoured version.

ter*_*don 12

Even simpler, if you just want the abc:

 echo 'http://www.youtube.com/watch?v=abc&g=xyz' | awk -F'[=&]' '{print $2}'
Run Code Online (Sandbox Code Playgroud)

If you want the xyz :

echo 'http://www.youtube.com/watch?v=abc&g=xyz' | awk -F'[=&]' '{print $4}'
Run Code Online (Sandbox Code Playgroud)

EXPLANATION:

  • awk : is a scripting language that automatically processes input files line by line, splitting each line into fields. So, when you process a file with awk, for each line, the first field is $1, the second $2 etc up to $N. By default awk uses blanks as the field separator.

  • -F'[=&]' : -F is used to change the field delimiter from spaces to something else. In this case, I am giving it a class of characters. Square brackets ([ ]) are used by many languages to denote groups of characters. So, specifically, -F'[=&]' means that awk should use both & and = as field delimiters.

  • Therefore, given the input string from your question, using & and = as delimiters, awk will read the following fields:

    http://www.youtube.com/watch?v=abc&g=xyz
    |----------- $1 -------------| --- - ---      
                                    |  |  |
                                    |  |  ?----- $4
                                    |  -------- $3
                                    ----------- $2
    
    Run Code Online (Sandbox Code Playgroud)

    So, all you need to do is print whichever one you want {print $4}.


You said you also want to check that the string is a valid youtube URL, you can't do that with sed since if it does not match the regex you give it, it will simply print the entire line. You can use a tool like Perl to only print if the regex matches:

echo 'http://www.youtube.com/watch?v=abc&g=xyz' | 
  perl -ne 's/http.*www.youtube.com\/watch\?v=(.+?)&.+/$1/ && print'
Run Code Online (Sandbox Code Playgroud)

Finally, to simply print abc you can use the standard UNIX tool cut:

echo 'http://www.youtube.com/watch?v=abc&g=xyz' | 
  cut -d '=' -f 2 | cut -d '&' -f 1
Run Code Online (Sandbox Code Playgroud)

  • @markvgti 在捕获模式方面,`sed` 并不是最适合这项工作的工具,它非常强大和快速,并且_可以_做到,但它比必要的要复杂。我添加了对 `awk` 命令如何工作的解释,你现在可能会发现它更容易理解。为了完整起见,我还添加了一个 `Perl` 和一个 `cut` 解决方案:)。 (2认同)