use*_*035 6 perl r pattern-matching
我正在努力尝试组合来自两个文件的部分匹配的字符串.
文件1包含唯一字符串列表.这些字符串与文件2中的许多字符串部分匹配.如何为每个匹配的大小写合并文件1中的行和文件2
文件1
mmu-miR-677-5p_MIMAT0017239
mmu-miR-181a-1-3p_MIMAT0000660
Run Code Online (Sandbox Code Playgroud)
文件2
mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGA
mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGACT
mmu-miR-677-5p_TTCAGTGATGATTAGCTTCTGACT
mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTAC
mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTACC
Run Code Online (Sandbox Code Playgroud)
期望的输出
mmu-miR-677-5p_MIMAT0017239 mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGA
mmu-miR-677-5p_MIMAT0017239 mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGACT
mmu-miR-677-5p_MIMAT0017239 mmu-miR-677-5p_TTCAGTGATGATTAGCTTCTGACT
mmu-miR-181a-1-3p_MIMAT0000660 mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTAC
mmu-miR-181a-1-3p_MIMAT0000660 mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTACC
Run Code Online (Sandbox Code Playgroud)
我试过pmatch()在R中使用,但是没有把它弄好.我看起来像perl会处理的东西?
也许是这样的:
perl -ne'exec q;perl;, "-ne", q $print (/\Q$.$1.q;/?"$. YES":$. .q\; NO\;);, "file2" if m;^(.*)_pat1;' file1
Run Code Online (Sandbox Code Playgroud)
这是一个简短的 Perl 解决方案,它将所有数据保存file1在哈希中,然后在file2扫描时检索它
use strict;
use warnings;
use autodie;
my @files = qw/ file1.txt file2.txt /;
my %file1 = do {
open my $fh, '<', $files[0];
map /([^_]+)_(\S+)/, <$fh>;
};
open my $fh, '<', $files[1];
while (<$fh>) {
my ($key) = /([^_]+)/;
printf "%-32s%s", "${key}_$file1{$key}", $_;
}
Run Code Online (Sandbox Code Playgroud)
输出
mmu-miR-677-5p_MIMAT0017239 mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGA
mmu-miR-677-5p_MIMAT0017239 mmu-miR-677-5p_CTTCAGTGATGATTAGCTTCTGACT
mmu-miR-677-5p_MIMAT0017239 mmu-miR-677-5p_TTCAGTGATGATTAGCTTCTGACT
mmu-miR-181a-1-3p_MIMAT0000660 mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTAC
mmu-miR-181a-1-3p_MIMAT0000660 mmu-miR-181a-1-3p_ACCATCGACCGTTGATTGTACC
Run Code Online (Sandbox Code Playgroud)