在子字符串和字符串中第一次出现数字之间提取模式

Question

在子字符串和字符串中第一次出现数字之间提取模式

以下是文件的内容:

xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

Run Code Online (Sandbox Code Playgroud)

我想提取组件名称component1 component2等.

这是我试过的:

for line in `sed -n -e '/^xxx-/p' $file`
do
    comp=`echo $line | sed  -e '/xxx-/,/[0-9]/p'`
    echo "comp - $comp"
done

Run Code Online (Sandbox Code Playgroud)

我也试过了 sed -e 's/.*xxx-\(.*\)[^0-9].*/\1/'

这是基于网上的一些信息.请给我sed指挥,如果可能的话也要逐步解释

第2部分.我还需要从字符串中提取版本号.版本号以数字开头,以数字结尾.接下来是xc-linux.正如您所看到的,为了保持唯一性,它具有随机字母数字字符(长度为7)作为版本号的一部分.

例如: xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r 在此字符串中,版本号为: 1.0-2-2acd314

Answer 1

jay*_*ngh 15

有很多方法可以提取数据.最简单的形式是grep.

GNU `grep`:

您可以使用grep带有PCRE选项的GNU获取所需数据-P:

$ cat file
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

Run Code Online (Sandbox Code Playgroud)

$ grep -oP '(?<=_)[^-]*' file
component1
component2
component3
component4

Run Code Online (Sandbox Code Playgroud)

这里我们使用的背后断言否定的样子告诉从捕捉一切_的-不incusive.

`awk`:

$ awk -F"[_-]" '{print $2}' file
component1
component2
component3
component4

Run Code Online (Sandbox Code Playgroud)

在这里,我们告诉awk使用-和_作为分隔符并打印第二列.

`sed`:

话虽如此,您还可以sed使用组捕获来提取所需的数据:

$ sed 's/.*_\([^-]*\)-.*/\1/' file
component1
component2
component3
component4

Run Code Online (Sandbox Code Playgroud)

正则表达式声明匹配任何字符零次或多次直到a _.从那时起,捕捉所有内容,直到-一个小组.在替换部分中,我们只使用组中捕获的数据,即使用后向引用来调用它\1.

归档时间：	11 年，11 月前
查看次数：	23126 次
最近记录：	8 年，2 月前

在子字符串和字符串中第一次出现数字之间提取模式

GNU grep:

awk:

sed:

GNU `grep`:

`awk`:

`sed`: