Bij*_*jan 5 xml csv perl xpath
我有一个2Gb CSV文件,其中第1列包含epoch中的时间,第二列包含10000+行XML文件(作为单行).
我想遍历此CSV的每一行,并将第二列XML保存到自己的文件中.我还使用XPath从XML文件中获取客户名称,以便将文件命名为[CustomerName]-[time from Column 1].xml.但是,有些XML文件不是有效的XML,我收到的错误是Unclosed Token on Line ....有没有办法忽略该消息,只是让它跳过该文件?以下是我的Perl代码:
my $file = '../FILENAME.csv';
open my $info, $file or die "Could not open $file: $!";
my $count = 0;
$| = 1;
while( my $line = <$info>) {
$count++; if($count == 1) {next;} #Ignore headers
$line =~ /(\d+),"(.*?)"$/; #Load time into $1, XML file into $2
my $time = $1;
my $report = $2;
$report =~ s/""/"/g; #Replace "" with "
my $xp = XML::XPath->new(xml => $report);
my $ext = $xp->getNodeText('/report/customer') . "-" . $time . ".xml"; #Generate filename with customer name and time
write_file($ext, $report);
}
close $info;
Run Code Online (Sandbox Code Playgroud)
我也愿意接受建议,以提高效率.
您可以尝试将有问题的代码包含在eval. 例如:
eval {
my $xp = XML::XPath->new(xml => $report);
my $ext = $xp->getNodeText('/report/customer') . "-" . $time . ".xml"; #Generate filename with customer name and time
write_file($ext, $report);
};
if ( $@ ) {
printf "ERROR: $@";
}
Run Code Online (Sandbox Code Playgroud)
下面的代码:
$count++; if($count == 1) {next;} #Ignore headers
$line =~ /(\d+),"(.*?)"$/; #Load time into $1, XML file into $2
my $time = $1;
my $report = $2;
Run Code Online (Sandbox Code Playgroud)
可以缩短为:
next if ++$count == 1; #Ignore headers
my ($time, $report) = ($line =~ /(\d+),"(.*)"$/); # time, XML file
Run Code Online (Sandbox Code Playgroud)