这是我第一次尝试 Perl,所以我知道这段代码很难看。其中一些来自不知道我在做什么,一些来自解决各种问题。我想要做的是搜索文件(samplefile.txt)以获取各种信息(9 个 parse_updates 函数),除非顺序发生变化,否则它可以正常工作。例如,如果一个样本文件在僵尸网络定义之前有证书包,那么它将无法找到证书包信息。我希望每个函数都开始搜索“新鲜”的示例文件,但情况似乎并非如此,我不知道为什么。不包括示例文件,因为代码帖子已经足够长,我认为问题出在我的函数逻辑中。
use strict;
use warnings;
use diagnostics;
use File::Slurp;
my @autoupdate;
my $autoupdate;
my $av_regex;
my @av_updates;
my $avdev_regex;
my @avdef_updates;
my $ipsatt_regex;
my @ipsatt_updates;
my $attdef_regex;
my @attdef_updates;
my $ipsmal_regex;
my @ipsmal_updates;
my $flowav_regex;
my @flowav_updates;
my $botnet_regex;
my @botnet_updates;
my $appdef_regex;
my @appdef_updates;
my $ipgeo_regex;
my @ipgeo_updates;
my $certbun_regex;
my @certbun_updates;
my $str1;
my $str2;
my $str3;
my $str4;
my $str5;
my $str6;
my $str7;
my $str8;
my $str9;
parse_updates1(); #AV Engine
parse_updates2(); #Virus Defs
parse_updates3(); #IPS Attack Engine
parse_updates4(); #Attack Defs
parse_updates5(); #IPS Mal URL DB
parse_updates6(); #Flow virus Defs
parse_updates7(); #Botnet Defs
parse_updates8(); #IP Geo DB
parse_updates9(); #Cert Bundle
sub parse_updates1{
print "\nTHIS IS AV Engine Section!!\n\n";
read_file('samplefile.txt', buf_ref => \$str1);
my $av_regex =qr/(AV Engine)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str1 =~ /$av_regex/g ) {
#putting each regex group into the array
push @av_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @av_updates;
print "$_\n" for @av_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@av_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @av_updates;
}
}
sub parse_updates2{
read_file('samplefile.txt', buf_ref => \$str2);
print "\nTHIS IS Virus Definitions Section!!\n\n";
my $avdef_regex =qr/(Application Definitions)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str2 =~ /$avdef_regex/g ) {
#putting each regex group into the array
push @avdef_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @avdef_updates;
print "$_\n" for @avdef_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@avdef_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @avdef_updates;
}
}
sub parse_updates3{
read_file('samplefile.txt', buf_ref => \$str3);
print "\nTHIS IS IPS Attack Engine Section!!\n\n";
my $ipsatt_regex =qr/(IPS Attack Engine)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str3 =~ /$ipsatt_regex/g ) {
#putting each regex group into the array
push @ipsatt_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @ipsatt_updates;
print "$_\n" for @ipsatt_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@ipsatt_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @ipsatt_updates;
}
}
sub parse_updates4{
read_file('samplefile.txt', buf_ref => \$str4);
print "\nTHIS IS Attack Definitions Section!!\n\n";
my $attdef_regex =qr/(Attack Definitions)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str4 =~ /$attdef_regex/g ) {
#putting each regex group into the array
push @attdef_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @attdef_updates;
print "$_\n" for @attdef_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@attdef_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @attdef_updates;
}
}
sub parse_updates5{
read_file('samplefile.txt', buf_ref => \$str5);
print "\nTHIS IS IPS Malicious URL Database Section!!\n\n";
my $ipsmal_regex =qr/(IPS Malicious URL Database)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str5 =~ /$ipsmal_regex/g ) {
#putting each regex group into the array
push @ipsmal_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @ipsmal_updates;
print "$_\n" for @ipsmal_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@ipsatt_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @ipsatt_updates;
}
}
sub parse_updates6{
read_file('samplefile.txt', buf_ref => \$str6);
print "\nTHIS IS Flow-Based Virus Definitions Section!!\n\n";
my $flowav_regex =qr/(Flow-based Virus Definitions)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str6 =~ /$flowav_regex/g ) {
#putting each regex group into the array
push @flowav_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @flowav_updates;
print "$_\n" for @flowav_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@flowav_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @flowav_updates;
}
}
sub parse_updates7{
read_file('samplefile.txt', buf_ref => \$str7);
print "\nTHIS IS Botnet Definitions Section!!\n\n";
my $botnet_regex =qr/(Botnet Definitions)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str7 =~ /$botnet_regex/g ) {
#putting each regex group into the array
push @botnet_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @botnet_updates;
print "$_\n" for @botnet_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@botnet_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @botnet_updates;
}
}
sub parse_updates8{
read_file('samplefile.txt', buf_ref => \$str8);
print "\nTHIS IS IP geography DB Section!!\n\n";
my $ipgeo_regex =qr/(IP Geography DB)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str8 =~ /$ipgeo_regex/g ) {
#putting each regex group into the array
push @ipgeo_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @ipgeo_updates;
print "$_\n" for @ipgeo_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@ipgeo_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @ipgeo_updates;
}
}
sub parse_updates9{
read_file('samplefile.txt', buf_ref => \$str9);
print "\nTHIS IS Certificate Bundle Section!!\n\n";
my $certbun_regex =qr/(Certificate Bundle)(.*\n)*?(Version:)(.*\n)*?(Contract Expiry Date:)(.*\n)*?(Last Updated using )(.*\n)*?(Last Update Attempt: )(.*\n)*?(Result: )(.*\n).*/p;
if ( $str9 =~ /$certbun_regex/g ) {
#putting each regex group into the array
push @certbun_updates, $1, $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12;
#Removing new linefeeds
chomp @certbun_updates;
print "$_\n" for @certbun_updates;
}
else {
print "\n\nGot Nothing!\n\n";
@certbun_updates = qw(notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound notfound);
print "$_\n" for @certbun_updates;
}
# End of sub parse_updates
}
Run Code Online (Sandbox Code Playgroud)
尽管在没有看到一些数据的情况下无法明确回答这个问题,但我想先提供对该程序的重写。这也可以解决问题。
所有这些功能都没有理由;他们都做的完全一样。也不需要变量的海洋;散列适用于命名事物的集合。我至少保留了一些原始选择,例如整体流程、使用File::Slurp等。
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
use File::Slurp;
my $fname = shift // die "Usage: $0 file\n"; #/
my %update = (
av => {
re => qr/pattern-for-av/,
name => q(AV Engine Section),
},
avdev => {
re => qr/pattern-for-avdev/,
name => q(Virus Definitions Section),
},
# ...
);
my $file_content = read_file($fname);
foreach my $code (sort keys %update) {
say "This is $update{$code}{name}";
my $captures = parse_update( $file_content, $update{$code}{re} );
$update{$code}{captures} = $captures;
}
dd \%update;
sub parse_update {
my ($file_content, $re) = @_;
my @captures = $file_content =~ /$re/;
if (not @captures) {
say "Got nohting!";
@captures = ( 'notfound' ) x 12; # apparently exactly 12
}
else { chomp @captures }
say for @captures;
return \@captures;
}
Run Code Online (Sandbox Code Playgroud)
正则表达式模式和部分名称都在 hash 中%update,然后添加结果(捕获)。这种数据组织的选择有点武断,因为我不知道上下文。
文件打开一次,其所有内容重复复制到子文件。请根据需要进行调整。例如,如果文件很大,还有其他方法可以使 sub 可以使用该数据。
That if (/.../g),在问题中使用,偶尔会看到,毫无意义,很容易出错——也可能导致问题中描述的那种问题。†在标量上下文中使用时,/g修饰符可满足复杂的需求,而不是用于单独的if语句。
成功匹配(从而捕获)的条件取自问题。子中的代码可以以多种其他方式组织,从更紧凑到更精细。
请注意,sub 不直接使用来自更高范围的任何内容;它需要的所有内容都显式传递给它,并返回其结果。这非常重要,因此为了避免耦合本意是不同的代码组件(这里是 sub 及其调用者);它们甚至可以驻留在不同的编译单元中。
这次重写很可能已经发现了错误并解决了问题;或者它可能没有。如果我们能看到数据样本,那么更有针对性的故障排除可能是可能的。
上面的代码已经过测试,有一个虚构的文件和合适的正则表达式模式。
†虽然我需要查看一些数据来确定导致报告行为的原因,但一个很好的候选者是毫无戒心地使用if (/.../g). 该修饰符使正则表达式记住它匹配的位置,下次在同一字符串上调用正则表达式时,它开始从前一个匹配字符串中的位置查找匹配项。
一个简单的例子
use warnings; use strict; use feature 'say';
my $s = q(one simple string);
if ($s =~ /(\w+)/g) { say $1 };
if ($s =~ /(\w+)/g) { say $1 };
say pos($s);
Run Code Online (Sandbox Code Playgroud)
哪个打印
一 简单的 10
其中最后一行是正则表达式跟踪的该点字符串中的位置;在第二场比赛之后。(pos函数非常适合查看正则表达式操作中发生的一些事情。)因此,在匹配后再次调用时,正则表达式会从它停止的地方继续,由/g修饰符提供;如果没有它,则在新调用中从头开始扫描字符串。
另一个例子,重复执行单个表达式
use warnings; use strict; use feature 'say';
my $s = q(one two);
sub func { say $1 if $_[0] =~ /(\w+)/g }; # /g is of consequence!
for (1..4) { func($s) }
Run Code Online (Sandbox Code Playgroud)
这打印
一 二
它完成了;没有了。那是因为引擎two在第二次匹配中超过了单词,因此在for循环的下一次迭代中没有任何匹配。
有关上述示例及其上下文的更多信息,请参阅这篇文章和这篇文章。
特别是第二个例子与问题中给出的非常相似。
上面的一些行为可以通过锚点和其他修饰符来修改,/g当然这很有用——但需要知道它的作用。
| 归档时间: |
|
| 查看次数: |
91 次 |
| 最近记录: |