在处理完Perl数组后如何从元数组中删除元素?

Jus*_*ᚄᚒᚔ 3 arrays perl parsing

我正在将一个postfix邮件日志文件读入一个数组,然后循环遍历它以提取消息.在第一遍,我正在检查"to ="行上的匹配并获取消息ID.在构建一个MSGID数组之后,我将循环回数组以提取有关to =,from =和client =行的信息.

我想要做的是,一旦我从数据中提取数据就从数组中删除一行,以便使处理更快(即少一行检查).

有什么建议?这是Perl.


编辑:下面的gbacon答案足以让我用一个坚实的解决方案.这是它的内脏:

my %msg;
while (<>) {
    my $line = $_;
    if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
            my $key = $1;
            push @{ $msg{$key}{$1} } => $2
                    while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g;
    }
    if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) {
            my $key = $3;
            push @{ $msg{$key}{date} } => $1;
            push @{ $msg{$key}{server} } => $2;
    }
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
Run Code Online (Sandbox Code Playgroud)

我确信第二个正则表达式可以更令人印象深刻,但它可以完成我需要的工作.我现在可以获取所有消息的哈希值并提取出我感兴趣的消息.

感谢所有回答的人.

Gre*_*con 5

一次性完成:

#! /usr/bin/perl

use warnings;
use strict;

# for demo only
*ARGV = *DATA;

my %msg;
while (<>) {
  if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
    my $key = $1;
    push @{ $msg{$key}{$1} } => $2
      while /\b(to|from|client)=(.+?)(?:,|$)/g;
  }
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
__DATA__
Apr  8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x]
Apr  8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<49dc4d9a.6020...@example.com>
Apr  8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<mailt...@example.com>, size=1087, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<us...@test.com>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<us...@test.com>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed
Apr  8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1]
Apr  8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<49dc4d9a.6020...@example.com>
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<mailt...@example.com>, size=1636, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<us...@test.com>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <49dc4d9a.6020...@example.com> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<us...@test.com>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <49dc4d9a.6020...@example.com> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed
Run Code Online (Sandbox Code Playgroud)

该代码通过首先寻找一个队列ID(例如,BA1CE3896562D8438973以上),我们在存储$key.

接下来,我们在当前行(感谢/g开关)上找到所有匹配,看起来像to=<...>,client=mail.example.com等等,有没有分隔逗号.

在模式中注意到

  • \b- 仅匹配单词边界(防止匹配xxxto=<...>)
  • (to|from|client)- 匹配tofromclient
  • (.+?) - 将字段的值与非贪心量词匹配
  • (?:,|$) - 匹配逗号或字符串结尾而不捕获 $3

非贪婪(.+?)迫使比赛停在它遇到的第一个逗号而不是最后一个.否则,就行了

to=<foo@example.com>, other=123

你会得到<foo@example.com>, other=123收件人!

然后,对于匹配的每个字段,我们将push它连接到阵列的末尾(例如,因为可能有多个收件人)连接到队列ID和字段名称.看看结果:

$VAR1 = {
  '62D8438973' => {
    'client' => [
      'localhost.localdomain[127.0.0.1]'
    ],
    'to' => [
      '<us...@test.com>',
      '<us...@test.com>'
    ],
    'from' => [
      '<mailt...@example.com>'
    ]
  },
  'BA1CE38965' => {
    'client' => [
      'mail.example.com[x.x.x.x]'
    ],
    'to' => [
      '<us...@test.com>',
      '<us...@test.com>'
    ],
    'from' => [
      '<mailt...@example.com>'
    ]
  }
};

现在假设您要打印队列ID为的邮件的所有收件人BA1CE38965:

my $queueid = "BA1CE38965";
foreach my $recip (@{ $msg{$queueid}{to} }) {
  print $recip, "\n":
}
Run Code Online (Sandbox Code Playgroud)

也许你只想知道有多少收件人:

print scalar @{ $msg{$queueid}{to} }, "\n";
Run Code Online (Sandbox Code Playgroud)

如果您愿意假设每封邮件只有一个客户端,请访问它

print $msg{$queueid}{client}[0], "\n";
Run Code Online (Sandbox Code Playgroud)