如何跟踪某些元素属于哪些分组?

Ste*_*hen 3 regex perl

我为伪劣的头衔道歉; 我不知道如何正确描述我遇到的问题.

我有以下格式的多个制表符分隔文件:

groupA    donuts     apples
groupB    car        dog        ball      meter
groupC    apples     donuts     car
groupD    ball       shirt      pencil    paper      donuts
Run Code Online (Sandbox Code Playgroud)

具有不同的行数.

对于每一行,第一个单词是组名,而其余行是对象的名称.我想要做的是跟踪每个对象所属的组.因此,在这个例子中,我会发现ball是部分groupDgroupB,而car仅仅是一部分groupC.apples是的一部分,groupA并且groupC同时pencil是唯一的一部分groupD.

由于我正在阅读的每个文件都有不同数量的行/组,因此实现此目的的最佳方法是什么?

#!/usr/bin/perl
use strict;
use warnings;

my $path = "../GENELIST.symbols.csv";
open(PATH, $path) || die "cannot open csv\n";
my @groups = ();
while(my $line = <PATH>){
    if($line =~ /^(\w+)\t/){
        push(@groups, $1);
    }
}
close(PATH);
#at this point I have the name of all the groups in the particular file (`groupA`, `groupB`, `groupC`, `groupD`). 
Run Code Online (Sandbox Code Playgroud)

Mil*_*ler 6

只需使用数组哈希.

要更熟悉这些结构,请查看: Perl Data Structures Cookbook

use strict;
use warnings;

my %groups;

while (<DATA>) {
    my ($group, @cols) = split;
    push @{$groups{$_}}, $group for @cols;
}

use Data::Dump;
dd \%groups;

__DATA__
groupA    donuts     apples
groupB    car        dog        ball      meter
groupC    apples     donuts     car
groupD    ball       shirt      pencil    paper      donuts
Run Code Online (Sandbox Code Playgroud)

输出:

{
  apples => ["groupA", "groupC"],
  ball   => ["groupB", "groupD"],
  car    => ["groupB", "groupC"],
  dog    => ["groupB"],
  donuts => ["groupA", "groupC", "groupD"],
  meter  => ["groupB"],
  paper  => ["groupD"],
  pencil => ["groupD"],
  shirt  => ["groupD"],
}
Run Code Online (Sandbox Code Playgroud)