我有一个文本文件,分隔标签.它们可以达到1 GB.我将根据其中的样本数量具有可变数量的列.每个样本有八列.例如,sampleA:ID1,id2,MIN_A,AVG_A,MAX_A,AR1_A,AR2_A,AR_A,AR_5.其中ID1和id2是所有样本的共同点.我想要实现的是根据样本数将整个文件拆分为多个文件块.
ID1,ID2,MIN_A,AVG_A,MAX_A,AR1_A,AR2_A,AR3_A,AR4_A,AR5_A,MIN_B, AVG_B, MAX_B,AR1_B,AR2_B,AR3_B,AR4_B,AR5_B,MIN_C,AVG_C,MAX_C,AR1_C,AR2_C,AR3_C,AR4_C,AR5_C
12,134,3535,4545,5656,5656,7675,67567,57758,875,8678,578,57856785,85587,574,56745,567356,675489,573586,5867,576384,75486,587345,34573,45485,5447
454385,3457,485784,5673489,5658,567845,575867,45785,7568,43853,457328,3457385,567438,5678934,56845,567348,58567,548948,58649,5839,546847,458274,758345,4572384,4758475,47487
Run Code Online (Sandbox Code Playgroud)
这是我的模型文件的外观,我想将它们作为:
File A :
ID1,ID2,MIN_A,AVG_A,MAX_A,AR1_A,AR2_A,AR3_A,AR4_A,AR5_A
12,134,3535,4545,5656,5656,7675,67567,57758,875
454385,3457,485784,5673489,5658,567845,575867,45785,7568,43853
File B:
ID1, ID2,MIN_B, AVG_B, MAX_B,AR1_B,AR2_B,AR3_B,AR4_B,AR5_B
12,134,8678,578,57856785,85587,574,56745,567356,675489
454385,3457,457328,3457385,567438,5678934,56845,567348,58567,548948
File C:
ID1, ID2,MIN_C,AVG_C,MAX_C,AR1_C,AR2_C,AR3_C,AR4_C,AR5_C
12,134,573586,5867,576384,75486,587345,34573,45485,5447
454385,3457,58649,5839,546847,458274,758345,4572384,4758475,47487.
Run Code Online (Sandbox Code Playgroud)
有没有比通过阵列更简单的方法?
我如何计算出我的逻辑是计算(标题数 - 2)并将它们除以8将得到文件中的样本数.然后遍历数组中的每个元素并解析它们.这样做似乎是一种乏味的方式.我很乐意知道任何更简单的方法来处理这个问题.
谢谢西普拉
#!/bin/env perl
use strict;
use warnings;
# open three output filehandles
my %fh;
for (qw[A B C]) {
open $fh{$_}, '>', "file$_" or die $!;
}
# open input
open my $in, '<', 'somefile' or die $!;
# read the header line. there are no doubt ways to parse this to
# work out what the rest of the program should do.
<$in>;
while (<$in>) {
chomp;
my @data = split /,/;
print $fh{A} join(',', @data[0 .. 9]), "\n";
print $fh{B} join(',', @data[0, 1, 10 .. 17]), "\n";
print $fh{C} join(',', @data[0, 1, 18 .. $#data]), "\n";
}
Run Code Online (Sandbox Code Playgroud)
更新:我感到无聊并且变得更加聪明,因此它会自动处理文件中的任意数量的8列记录.不幸的是,我没有时间解释它或添加评论.
#!/usr/bin/env perl
use strict;
use warnings;
# open input
open my $in, '<', 'somefile' or die $!;
chomp(my $head = <$in>);
my @cols = split/,/, $head;
die 'Invalid number of records - ' . @cols . "\n"
if (@cols -2) % 8;
my @files;
my $name = 'A';
foreach (1 .. (@cols - 2) / 8) {
my %desc;
$desc{start_col} = (($_ - 1) * 8) + 2;
$desc{end_col} = $desc{start_col} + 7;
open $desc{fh}, '>', 'file' . $name++ or die $!;
print {$desc{fh}} join(',', @cols[0,1],
@cols[$desc{start_col} .. $desc{end_col}]),
"\n";
push @files, \%desc;
}
while (<$in>) {
chomp;
my @data = split /,/;
foreach my $f (@files) {
print {$f->{fh}} join(',', @data[0,1],
@data[$f->{start_col} .. $f->{end_col}]),
"\n";
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2786 次 |
| 最近记录: |