如何修剪文件 - 删除具有相同值的列

Question

如何修剪文件 - 删除具有相同值的列

我希望您通过删除具有相同值的列来修剪文件.

# the file I have (tab-delimited, millions of columns)
jack 1 5 9
john 3 5 0
lisa 4 5 7

Run Code Online (Sandbox Code Playgroud)

# the file I want (remove the columns with the same value in all lines)
jack 1 9
john 3 0
lisa 4 7

Run Code Online (Sandbox Code Playgroud)

你能否就这个问题给我任何指示？我更喜欢sed或awk解决方案,或者可能是perl解决方案.

提前致谢.最好,

Answer 1

jke*_*ian 5

这是一个快速的perl脚本,用于确定哪些列可以剪切.

open FH, "file" or die $!;
my @baseline = split /\t/,<FH>;         #snag the first row
my @linemap = 0..$#baseline;            #list all equivalent columns (all of them)

while(<FH>) {                           #loop over the file
    my @line = split /\t/;
    @linemap = grep {$baseline[$_] eq $line[$_]}  @linemap; #filter out any that aren't equal
}
print join " ", @linemap;
print "\n";

Run Code Online (Sandbox Code Playgroud)

您可以使用上述许多建议来实际删除列.我最喜欢的可能是cut实现,部分原因是上面的perl脚本可以修改为你提供精确的命令(甚至可以为你运行).

@linemap = map {$_+1} @linemap;                   #Cut is 1-index based
print "cut --complement -f ".join(",",@linemap)." file\n";

Run Code Online (Sandbox Code Playgroud)

Answer 2

Set*_*son 3

#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
  next if (($. % 4) == 3);
  print;
}

Run Code Online (Sandbox Code Playgroud)

Well, this was assuming it was the third column. If it is by value:

#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
  next if (($_ == 5);
  print;
}

Run Code Online (Sandbox Code Playgroud)

With the question edit, OP's desires become clear. How about:

#!/usr/bin/perl
open(R,"<","/tmp/filename") || die;
my $first = 1;
my (@cols);
while (<R>)
{
  my (@this) = split(/\t/);
  if ($. == 1)
  {
    @cols = @this;
  }
  else
  {
    for(my $x=0;$x<=$#cols;$x++)
    {
      if (defined($cols[$x]) && !($cols[$x] ~~ $this[$x]))
      {
        $cols[$x] = undef;
      }
    }
  }
  next if (($_ == 5));
#  print;
}
close(R);
my(@del);
print "Deleting columns: ";
for(my $x=0;$x<=$#cols;$x++)
{
  if (defined($cols[$x]))
  {
    print "$x ($cols[$x]), ";
    push(@del,$x-int(@del));
  }
}
print "\n";

open(R,"<","/tmp/filename") || die;
while (<R>)
{
  chomp;
  my (@this) = split(/\t/);

  foreach my $col (@del)
  {
    splice(@this,$col,1);
  }

  print join("\t",@this)."\n";
}
close(R);

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，8 月前
查看次数：	2453 次
最近记录：	14 年，8 月前