对于这个给定的输入:
How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
Run Code Online (Sandbox Code Playgroud)
我想要这个输出:
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)
获取整行只包含三个重复的“这个”词。(不区分大小写匹配)
mur*_*uru 13
在 中perl
,this
不区分大小写地替换为自身并计算替换次数:
$ perl -ne 's/(this)/$1/ig == 3 && print' <<EOF
How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
EOF
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)
改为使用匹配计数:
perl -ne 'my $c = () = /this/ig; $c == 3 && print'
Run Code Online (Sandbox Code Playgroud)
如果你有 GNU awk,一个非常简单的方法:
gawk -F'this' -v IGNORECASE=1 'NF == 4'
Run Code Online (Sandbox Code Playgroud)
字段数将比分隔符数多 1。
在 python 中,这可以完成这项工作:
#!/usr/bin/env python3
s = """How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this"""
for line in s.splitlines():
if line.lower().count("this") == 3:
print(line)
Run Code Online (Sandbox Code Playgroud)
输出:
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)
或者从文件中读取,以文件为参数:
#!/usr/bin/env python3
s = """How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this"""
for line in s.splitlines():
if line.lower().count("this") == 3:
print(line)
Run Code Online (Sandbox Code Playgroud)
将脚本粘贴到一个空文件中,另存为find_3.py
,通过命令运行它:
python3 /path/to/find_3.py <file_withlines>
Run Code Online (Sandbox Code Playgroud)当然,单词“this”可以被任何其他单词(或其他字符串或行部分)替换,并且每行出现的次数可以设置为该行中的任何其他值:
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)
如果文件很大(数十万/百万行),下面的代码会更快;它每行读取文件,而不是一次加载文件:
#!/usr/bin/env python3
import sys
file = sys.argv[1]
with open(file) as src:
lines = [line.strip() for line in src.readlines()]
for line in lines:
if line.lower().count("this") == 3:
print(line)
Run Code Online (Sandbox Code Playgroud)
假设你的源文件是 tmp.txt,
grep -iv '.*this.*this.*this.*this' tmp.txt | grep -i '.*this.*this.*this.*'
Run Code Online (Sandbox Code Playgroud)
左边的 grep 输出在 tmp.txt 中没有 4 个或更多不区分大小写的“this”出现的所有行。
结果通过管道传送到右侧 grep,它输出左侧 grep 结果中出现 3 次或更多的所有行。
更新:感谢@Muru,这是此解决方案的更好版本,
grep -Eiv '(.*this){4,}' tmp.txt | grep -Ei '(.*this){3}'
Run Code Online (Sandbox Code Playgroud)
用 n+1 替换 4,用 n 替换 3。
您可以awk
为此玩一些:
awk -F"this" 'BEGIN{IGNORECASE=1} NF==4' file
Run Code Online (Sandbox Code Playgroud)
这将返回:
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)
我们所做的就是为自己定义字段分隔符this
。这样,该行将具有与单词this
出现次数一样多的字段 +1 。
为了使其不区分大小写,我们使用IGNORECASE = 1
. 请参阅参考:匹配中的区分大小写。
然后,只需说NF==4
让所有这些行都this
恰好有 3 次。不需要更多代码,因为{print $0}
(即打印当前行)是awk
表达式计算结果为 时的默认行为True
。
假设这些行存储在一个名为 的文件中FILE
:
while read line; do
if [ $(grep -oi "this" <<< "$line" | wc -w) = 3 ]; then
echo "$line";
fi
done <FILE
Run Code Online (Sandbox Code Playgroud)