解析文件在Perl中垂直分隔

nev*_*int 2 perl parsing

我有一个看起来像这样的文件:

*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA 
MED = *62

*NEWRECORD
RECTYPE = D
MH = Urinary Bladder
AQ = AB AH BS CH CY DE EM EN GD IM IN IR ME MI PA PH PP PS RA RE RI SE SU TR UL US VI
CX = consider also terms at CYST- and VESIC-
MED = *1359
Run Code Online (Sandbox Code Playgroud)

每个记录块具有不同的行数(例如,CX条目并不总是存在).但如果CX存在,则仅显示为1个条目.我们希望得到一个Hash,它将"MH"作为键,将"CX"作为值.

因此解析上面的数据我们希望得到这个结构:

$VAR = {  "Urinary Bladder" => ["CYST-" , "VESIC-"]};
Run Code Online (Sandbox Code Playgroud)

什么是解析它的正确方法?

我坚持这个,这不会给我我想要的结果.

use Data::Dumper;
my %bighash;
my $key = "";
my $cx = "";
while (<>) {

   chomp;

   if (/^MH = (\w+/)) {

      $key = $1;     
      push @{$bighash{$key}}, " ";
   }
   elsif ( /^CX = (\w+/)) {
      $cx = $1;

   }
   else {
      push @{$bighash{$key}}, $cx;

   }

} 
Run Code Online (Sandbox Code Playgroud)

Dav*_*oss 5

如果您使用一次$/读取一个段落的数据,这会变得更简单.我很惊讶没有其他人建议这样做.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

my %bighash;

$/ = '';

while (<DATA>) {
  if (my ($k) = /^MH = (.*?)$/m and my ($v) = /^CX = (.*?)$/m) {
    $bighash{$k} = [ $v =~ /([A-Z]+-)/g ];
  }
}

say Dumper \%bighash;

__DATA__
*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA 
MED = *62

*NEWRECORD
RECTYPE = D
MH = Urinary Bladder
AQ = AB AH BS CH CY DE EM EN GD IM IN IR ME MI PA PH PP PS RA RE RI SE SU TR UL US VI
CX = consider also terms at CYST- and VESIC-
MED = *1359
Run Code Online (Sandbox Code Playgroud)

输出如下所示:

$VAR1 = {
          'Urinary Bladder' => [
                                 'CYST-',
                                 'VESIC-'
                               ]
        };
Run Code Online (Sandbox Code Playgroud)