Dud*_*ude 6 grep bash sed awk regular-expression
我正在处理电影数据库问题以改进正则表达式,这是我遇到的问题。我的数据集如下所示:
电影名称(变量空格和制表符) year
Movie1(它们之间可以有空格或多个空格)(变量空格和制表符可以是 \t+ 或多个空格或单个空格> Year1
Movie2(它们之间可以有空格或多个空格)(变量空格和制表符可以是 \t+ 或多个空格或单个空格> Year2
Movie3(它们之间可以有空格或多个空格)(变量空格和制表符可以是 \t+ 或多个空格或单个空格> Year3
Movie4(可以有空格或多个空格)它们之间的空格)(变量空格和制表符可以是 \t+ 或多个空格或单个空格> Year4
我想提取所有电影的名称。这些是我在做这件事时面临的挑战:
1:分隔符是可变的。如果它是冒号或独特的东西,我会使用 awk 命令来提取它们,就像这样 awk -F 'separator' '{print $1}'
在这种情况下,它可以是单个空格、两个或多个空格或 \ 的组合t 或空格。2:对于分隔符为\t 的那些行,我可以使用\t 来提取它,因为电影名称中没有。但是如果分隔符是一个空格或两个空格呢?它们很容易出现在电影的名字中。在这些情况下,我不知道该怎么办。
我知道这个问题非常严格和具体。但正如我之前所描述的,我在这里很受阻。我想不出任何办法来解决这个问题。
是否有任何可用于实现目标的 grep/sed/awk 与 reg-ex 的组合?
重击:
while read -r line; do
if [[ $line =~ (.*)[[:blank:]]+[0-9]{4}$ ]]; then
echo "${BASH_REMATCH[1]}"
fi
done < data
Run Code Online (Sandbox Code Playgroud)
sed:
sed 's/[[:blank:]]\+[0-9]\{4\}$//' < data
Run Code Online (Sandbox Code Playgroud)
这真的很简单。只要最后一个字段,即年份,不包含任何空格(这从您的问题中不清楚,但我假设是这种情况),您需要做的就是删除最后一个字段。例如:
$ cat movies
Casablanca 1942
Eternal Sunshine of the Spotless Mind 2004
He Died with a Felafel in His Hand 2001
The Blues Brothers 1980
Run Code Online (Sandbox Code Playgroud)
所以,如果你只想打印标题,你可以使用:
$ perl -lpe 's/[^\s]+$//' movies
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
$ sed 's/[^ \t]*$//' movies
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
Run Code Online (Sandbox Code Playgroud)
或者,也折叠标题中的空格:
$ sed -r 's/[\t ]+/ /g;s/[^ \t]*$//' movies
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
$ perl -lpe 's/\s+/ /g;s/[^\s]+$//' movies
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
$ awk '{for(i=1;i<NF-1;i++){printf "%s ",$i} print $(NF-1)}' movies
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
Run Code Online (Sandbox Code Playgroud)
如果年份总是 4 位数字,则可以使用
$ perl -lpe 's/....$//' movies
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
Run Code Online (Sandbox Code Playgroud)
或者
$ perl -lpe 's/\s+/ /g;s/....$//' movies
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
Run Code Online (Sandbox Code Playgroud)
或者
$ while read line; do echo ${line%%????}; done < movies|od -c
Casablanca
Eternal Sunshine of the Spotless Mind
He Died with a Felafel in His Hand
The Blues Brothers
Run Code Online (Sandbox Code Playgroud)