合并部分匹配字符串

use*_*035 6 perl r pattern-matching

我正在努力尝试组合来自两个文件的部分匹配的字符串.

文件1包含唯一字符串列表.这些字符串与文件2中的许多字符串部分匹配.如何为每个匹配的大小写合并文件1中的行和文件2

文件1

mmu-miR-677-5p_MIMAT0017239
mmu-miR-181a-1-3p_MIMAT0000660
Run Code Online (Sandbox Code Playgroud)

文件2

mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGA
mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGACT
mmu-miR-677-5p_TTCAGTGATGATTAGCTTCTGACT
mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTAC
mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTACC
Run Code Online (Sandbox Code Playgroud)

期望的输出

mmu-miR-677-5p_MIMAT0017239     mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGA
mmu-miR-677-5p_MIMAT0017239     mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGACT
mmu-miR-677-5p_MIMAT0017239     mmu-miR-677-5p_TTCAGTGATGATTAGCTTCTGACT
mmu-miR-181a-1-3p_MIMAT0000660  mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTAC
mmu-miR-181a-1-3p_MIMAT0000660  mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTACC
Run Code Online (Sandbox Code Playgroud)

我试过pmatch()在R中使用,但是没有把它弄好.我看起来像perl会处理的东西?

也许是这样的:

perl -ne'exec q;perl;, "-ne", q $print (/\Q$.$1.q;/?"$. YES":$. .q\; NO\;);, "file2" if m;^(.*)_pat1;' file1
Run Code Online (Sandbox Code Playgroud)

Bor*_*din 4

这是一个简短的 Perl 解决方案,它将所有数据保存file1在哈希中,然后在file2扫描时检索它

use strict;
use warnings;
use autodie;

my @files = qw/ file1.txt file2.txt /;

my %file1 = do {
  open my $fh, '<', $files[0];
  map /([^_]+)_(\S+)/, <$fh>;
};

open my $fh, '<', $files[1];
while (<$fh>) {
  my ($key) = /([^_]+)/;
  printf "%-32s%s", "${key}_$file1{$key}", $_;
}
Run Code Online (Sandbox Code Playgroud)

输出

mmu-miR-677-5p_MIMAT0017239     mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGA
mmu-miR-677-5p_MIMAT0017239     mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGACT
mmu-miR-677-5p_MIMAT0017239     mmu-miR-677-5p_TTCAGTGATGATTAGCTTCTGACT
mmu-miR-181a-1-3p_MIMAT0000660  mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTAC
mmu-miR-181a-1-3p_MIMAT0000660  mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTACC
Run Code Online (Sandbox Code Playgroud)