我为伪劣的头衔道歉; 我不知道如何正确描述我遇到的问题.
我有以下格式的多个制表符分隔文件:
groupA donuts apples
groupB car dog ball meter
groupC apples donuts car
groupD ball shirt pencil paper donuts
Run Code Online (Sandbox Code Playgroud)
具有不同的行数.
对于每一行,第一个单词是组名,而其余行是对象的名称.我想要做的是跟踪每个对象所属的组.因此,在这个例子中,我会发现ball是部分groupD和groupB,而car仅仅是一部分groupC.apples是的一部分,groupA并且groupC同时pencil是唯一的一部分groupD.
由于我正在阅读的每个文件都有不同数量的行/组,因此实现此目的的最佳方法是什么?
#!/usr/bin/perl
use strict;
use warnings;
my $path = "../GENELIST.symbols.csv";
open(PATH, $path) || die "cannot open csv\n";
my @groups = ();
while(my $line = <PATH>){
if($line =~ /^(\w+)\t/){
push(@groups, $1);
}
}
close(PATH);
#at this point I have the name of all the groups in the particular file (`groupA`, `groupB`, `groupC`, `groupD`).
Run Code Online (Sandbox Code Playgroud)
只需使用数组哈希.
要更熟悉这些结构,请查看: Perl Data Structures Cookbook
use strict;
use warnings;
my %groups;
while (<DATA>) {
my ($group, @cols) = split;
push @{$groups{$_}}, $group for @cols;
}
use Data::Dump;
dd \%groups;
__DATA__
groupA donuts apples
groupB car dog ball meter
groupC apples donuts car
groupD ball shirt pencil paper donuts
Run Code Online (Sandbox Code Playgroud)
输出:
{
apples => ["groupA", "groupC"],
ball => ["groupB", "groupD"],
car => ["groupB", "groupC"],
dog => ["groupB"],
donuts => ["groupA", "groupC", "groupD"],
meter => ["groupB"],
paper => ["groupD"],
pencil => ["groupD"],
shirt => ["groupD"],
}
Run Code Online (Sandbox Code Playgroud)