Jus*_*ᚄᚒᚔ 3 arrays perl parsing
我正在将一个postfix邮件日志文件读入一个数组,然后循环遍历它以提取消息.在第一遍,我正在检查"to ="行上的匹配并获取消息ID.在构建一个MSGID数组之后,我将循环回数组以提取有关to =,from =和client =行的信息.
我想要做的是,一旦我从数据中提取数据就从数组中删除一行,以便使处理更快(即少一行检查).
有什么建议?这是Perl.
编辑:下面的gbacon答案足以让我用一个坚实的解决方案.这是它的内脏:
my %msg;
while (<>) {
my $line = $_;
if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
my $key = $1;
push @{ $msg{$key}{$1} } => $2
while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g;
}
if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) {
my $key = $3;
push @{ $msg{$key}{date} } => $1;
push @{ $msg{$key}{server} } => $2;
}
}
use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
Run Code Online (Sandbox Code Playgroud)
我确信第二个正则表达式可以更令人印象深刻,但它可以完成我需要的工作.我现在可以获取所有消息的哈希值并提取出我感兴趣的消息.
感谢所有回答的人.
一次性完成:
#! /usr/bin/perl
use warnings;
use strict;
# for demo only
*ARGV = *DATA;
my %msg;
while (<>) {
if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
my $key = $1;
push @{ $msg{$key}{$1} } => $2
while /\b(to|from|client)=(.+?)(?:,|$)/g;
}
}
use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
__DATA__
Apr 8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x]
Apr 8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<49dc4d9a.6020...@example.com>
Apr 8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<mailt...@example.com>, size=1087, nrcpt=2 (queue active)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<us...@test.com>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<us...@test.com>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed
Apr 8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1]
Apr 8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<49dc4d9a.6020...@example.com>
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<mailt...@example.com>, size=1636, nrcpt=2 (queue active)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<us...@test.com>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <49dc4d9a.6020...@example.com> Queued mail for delivery)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<us...@test.com>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <49dc4d9a.6020...@example.com> Queued mail for delivery)
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed
Run Code Online (Sandbox Code Playgroud)
该代码通过首先寻找一个队列ID(例如,BA1CE38965和62D8438973以上),我们在存储$key.
接下来,我们在当前行(感谢/g开关)上找到所有匹配,看起来像to=<...>,client=mail.example.com等等,有没有分隔逗号.
在模式中注意到
\b- 仅匹配单词边界(防止匹配xxxto=<...>)(to|from|client)- 匹配to或from或client(.+?) - 将字段的值与非贪心量词匹配(?:,|$) - 匹配逗号或字符串结尾而不捕获 $3非贪婪(.+?)迫使比赛停在它遇到的第一个逗号而不是最后一个.否则,就行了
to=<foo@example.com>, other=123
你会得到<foo@example.com>, other=123收件人!
然后,对于匹配的每个字段,我们将push它连接到阵列的末尾(例如,因为可能有多个收件人)连接到队列ID和字段名称.看看结果:
$VAR1 = {
'62D8438973' => {
'client' => [
'localhost.localdomain[127.0.0.1]'
],
'to' => [
'<us...@test.com>',
'<us...@test.com>'
],
'from' => [
'<mailt...@example.com>'
]
},
'BA1CE38965' => {
'client' => [
'mail.example.com[x.x.x.x]'
],
'to' => [
'<us...@test.com>',
'<us...@test.com>'
],
'from' => [
'<mailt...@example.com>'
]
}
};
现在假设您要打印队列ID为的邮件的所有收件人BA1CE38965:
my $queueid = "BA1CE38965";
foreach my $recip (@{ $msg{$queueid}{to} }) {
print $recip, "\n":
}
Run Code Online (Sandbox Code Playgroud)
也许你只想知道有多少收件人:
print scalar @{ $msg{$queueid}{to} }, "\n";
Run Code Online (Sandbox Code Playgroud)
如果您愿意假设每封邮件只有一个客户端,请访问它
print $msg{$queueid}{client}[0], "\n";
Run Code Online (Sandbox Code Playgroud)