使用正则表达式删除python的重复字符串

Question

使用正则表达式删除python的重复字符串

我有一个.txt文件,其中生成了许多Snort警报.我想搜索此文件并删除重复的警报,并只保留其中一个.到目前为止我使用以下代码:

with open('SnortReportFinal', 'r') as f:
    file_lines = f.readlines()

cont_lines = []
for line in range(len(file_lines)):
        if re.search('\d:\d+:\d+', file_lines[line]):
        cont_lines.append(line)

for idx in cont_lines[1:]: # skip one instance of the string
    file_lines[idx] = "" # replace all others

with open('SnortReportFinal', 'w') as f:
    f.writelines(file_lines)

Run Code Online (Sandbox Code Playgroud)

正则表达式匹配我正在搜索的字符串,即1:234:5,如果它找到相同字符串的多个实例,我希望它删除它们并且只保留一个.这不起作用,因为所有其他字符串都被删除,并且它只保留表达式匹配的一个字符串.

文件包含这样的文字:

[1:368:6] ICMP PING BSDtype [**]
[1:368:6] ICMP PING BSDtype [**]
[1:368:6] ICMP PING BSDtype [**]
[1:368:6] ICMP PING BSDtype [**]

Run Code Online (Sandbox Code Playgroud)

部分[1:368:6]可以是数字的变体,即[1:5476:5].

我希望我的预期输出只是:

[1:368:6] ICMP PING BSDtype [**]
[1:563:2] ICMP PING BSDtype [**]

Run Code Online (Sandbox Code Playgroud)

其余的字符串被删除,休息我的意思是数字的差异是好的,但不是重复的数字.

Answer 1

wnn*_*maw 5

看起来你真的不需要正则表达式.要删除重复项,只需:

alerts = set(f.readlines())

Run Code Online (Sandbox Code Playgroud)

这会将文件中的行列表转换为一个集合,从而删除重复项.从这里,您可以直接将设置写回文本文件.

或者,您可以直接调用文件对象上的set,就像Padraic Cunningham在评论中指出的那样:

alerts = set(f)

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，9 月前
查看次数：	416 次
最近记录：	10 年，9 月前