好的,所以我已经阅读了不同的方法,但我只是想检查一下我的方式是否有一个看不见的问题,或者是否有更好的方法(也许是grep?).
这是我的工作代码:
#!usr/bin/perl
use strict;
use warnings;
my $chapternumber;
open my $corpus, '<', "/Users/jon/Desktop/chpts/chpt1-8/Lifeprocessed.txt" or die $!;
while (my $sentence = <$corpus>)
{
if ($sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ )
{
$chapternumber = $1;
$chapternumber =~ s/\./_/;
}
open my $outfile, '>>', "/Users/jon/Desktop/chpts/chpt$chapternumber.txt" or die $!;
print $outfile $sentence;
}
Run Code Online (Sandbox Code Playgroud)
该文件是一本教科书,我已经记新的章节:~ 1.1 Organisms Have Changed over Billions of Years 1.1.或~ 15Intro ...或~ F_14我想这是一个新的文件的开头:chpt1_1.txt(或其他chpt15Intro等....).当我找到下一章分隔符时,哪个结束.
1选项:也许不是逐行,只是像这样得到整个块?:
local $/ = "~";
open...
while...
next unless ($sentenceblock =~ /\~\s([\d+F][\.I_][\d\w]+)\s/);
....
Run Code Online (Sandbox Code Playgroud)
非常感谢.
一,好事:
enabled strict and warnings
using 3-arg open and lexical filehandles
checking the return value from open()
Run Code Online (Sandbox Code Playgroud)
但你的正则表达根本就没有意义.
~ is not "meta" in regexes, so it does not need escaping
. is not "meta" in a character class, so it does not need escaping
[\d+F] is equivalent to [+F\d] (what is the "F" for? + matches a literal plus character in a character class, it does NOT mean "one or more" here
[\.I_] what is the "I" for? What is the underscore for?
[\d\w] is equivalent to [\w] and even to just \w
Run Code Online (Sandbox Code Playgroud)
您的代码调用open()方式需要的次数多.
对于使用单个字符,tr ///优于s ///.
希望这会让你走上正轨:
#!/usr/bin/perl
use warnings;
use strict;
my $outfile;
while (<DATA>) {
if ( my($chapternumber) = /^~\s([\d.]+)/) {
$chapternumber =~ tr/./_/;
close $outfile if $outfile;
open $outfile, '>', "chpt$chapternumber.txt"
or die "could not open 'chpt$chapternumber.txt' $!";
}
print {$outfile} $_;
}
__DATA__
~ 1.1 Organisms Have Changed over Billions of Years 1.1
stuff
about changing
organisms
~ 1.2 Chapter One, Part Two 1.2
part two
stuff is here
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1422 次 |
| 最近记录: |