我有一个看起来像这样的文件:
*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA
MED = *62
*NEWRECORD
RECTYPE = D
MH = Urinary Bladder
AQ = AB AH BS CH CY DE EM EN GD IM IN IR ME MI PA PH PP PS RA RE RI SE SU TR UL US VI
CX = consider also terms at CYST- and VESIC-
MED = *1359
Run Code Online (Sandbox Code Playgroud)
每个记录块具有不同的行数(例如,CX条目并不总是存在).但如果CX存在,则仅显示为1个条目.我们希望得到一个Hash,它将"MH"作为键,将"CX"作为值.
因此解析上面的数据我们希望得到这个结构:
$VAR = { "Urinary Bladder" => ["CYST-" , "VESIC-"]};
Run Code Online (Sandbox Code Playgroud)
什么是解析它的正确方法?
我坚持这个,这不会给我我想要的结果.
use Data::Dumper;
my %bighash;
my $key = "";
my $cx = "";
while (<>) {
chomp;
if (/^MH = (\w+/)) {
$key = $1;
push @{$bighash{$key}}, " ";
}
elsif ( /^CX = (\w+/)) {
$cx = $1;
}
else {
push @{$bighash{$key}}, $cx;
}
}
Run Code Online (Sandbox Code Playgroud)
如果您使用一次$/读取一个段落的数据,这会变得更简单.我很惊讶没有其他人建议这样做.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my %bighash;
$/ = '';
while (<DATA>) {
if (my ($k) = /^MH = (.*?)$/m and my ($v) = /^CX = (.*?)$/m) {
$bighash{$k} = [ $v =~ /([A-Z]+-)/g ];
}
}
say Dumper \%bighash;
__DATA__
*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA
MED = *62
*NEWRECORD
RECTYPE = D
MH = Urinary Bladder
AQ = AB AH BS CH CY DE EM EN GD IM IN IR ME MI PA PH PP PS RA RE RI SE SU TR UL US VI
CX = consider also terms at CYST- and VESIC-
MED = *1359
Run Code Online (Sandbox Code Playgroud)
输出如下所示:
$VAR1 = {
'Urinary Bladder' => [
'CYST-',
'VESIC-'
]
};
Run Code Online (Sandbox Code Playgroud)