如何获得特定单词恰好重复 N 次的行?

αғs*_*нιη 8 text-processing

对于这个给定的输入:

How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
Run Code Online (Sandbox Code Playgroud)

我想要这个输出:

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)

获取整行只包含三个重复的“这个”词。(不区分大小写匹配)

mur*_*uru 13

在 中perlthis不区分大小写地替换为自身并计算替换次数:

$ perl -ne 's/(this)/$1/ig == 3 && print' <<EOF
How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
EOF
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)

改为使用匹配计数

perl -ne 'my $c = () = /this/ig; $c == 3 && print'
Run Code Online (Sandbox Code Playgroud)

如果你有 GNU awk,一个非常简单的方法:

gawk -F'this' -v IGNORECASE=1 'NF == 4'
Run Code Online (Sandbox Code Playgroud)

字段数将比分隔符数多 1。


Jac*_*ijm 9

在 python 中,这可以完成这项工作:

#!/usr/bin/env python3

s = """How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this"""

for line in s.splitlines():
    if line.lower().count("this") == 3:
        print(line)
Run Code Online (Sandbox Code Playgroud)

输出:

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)

或者从文件中读取,以文件为参数:

#!/usr/bin/env python3

s = """How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this"""

for line in s.splitlines():
    if line.lower().count("this") == 3:
        print(line)
Run Code Online (Sandbox Code Playgroud)

当然,单词“this”可以被任何其他单词(或其他字符串或行部分)替换,并且每行出现的次数可以设置为该行中的任何其他值:

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)

编辑

如果文件很大(数十万/百万行),下面的代码会更快;它每行读取文件,而不是一次加载文件:

#!/usr/bin/env python3
import sys

file = sys.argv[1]

with open(file) as src:
    lines = [line.strip() for line in src.readlines()]

for line in lines:
    if line.lower().count("this") == 3:
        print(line)
Run Code Online (Sandbox Code Playgroud)


Sri*_*Sri 9

假设你的源文件是 tmp.txt,

grep -iv '.*this.*this.*this.*this' tmp.txt | grep -i '.*this.*this.*this.*'
Run Code Online (Sandbox Code Playgroud)

左边的 grep 输出在 tmp.txt 中没有 4 个或更多不区分大小写的“this”出现的所有行。

结果通过管道传送到右侧 grep,它输出左侧 grep 结果中出现 3 次或更多的所有行。

更新:感谢@Muru,这是此解决方案的更好版本,

grep -Eiv '(.*this){4,}' tmp.txt | grep -Ei '(.*this){3}'
Run Code Online (Sandbox Code Playgroud)

用 n+1 替换 4,用 n 替换 3。

  • 稍微简化一下:`grep -Eiv '(.*this){4,}' | grep -Ei '(.*this){3}'` - 这可能使其适用于 N=50。 (5认同)

fed*_*qui 6

您可以awk为此玩一些:

awk -F"this" 'BEGIN{IGNORECASE=1} NF==4' file
Run Code Online (Sandbox Code Playgroud)

这将返回:

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
Run Code Online (Sandbox Code Playgroud)

解释

  • 我们所做的就是为自己定义字段分隔符this。这样,该行将具有与单词this出现次数一样多的字段 +1 。

  • 为了使其不区分大小写,我们使用IGNORECASE = 1. 请参阅参考:匹配中的区分大小写

  • 然后,只需说NF==4让所有这些行都this恰好有 3 次。不需要更多代码,因为{print $0}(即打印当前行)是awk表达式计算结果为 时的默认行为True


xyz*_*xyz 5

假设这些行存储在一个名为 的文件中FILE

while read line; do 
    if [ $(grep -oi "this" <<< "$line" | wc -w)  = 3 ]; then 
        echo "$line"; 
    fi  
done  <FILE
Run Code Online (Sandbox Code Playgroud)

  • @muru 不,`-c` 选项将计算每行中与“this”匹配的 *lines* 的数量,而不是“this”单词的数量。 (2认同)