我需要将文件拆分为唯一的文件名。
我可以用sed
命令来完成,例如, sed -n '/scaffold135_/w 135-scaf.txt' input file.txt
但它很耗时,所以我需要一种聪明的方法来更快地完成它。下面是一个输入示例(原始文件有一百万行):
scaffold1_115,T,N,N,N,N,A,N,N,N,N,N,N,T,N,T,T,N,A,A,N,N,A
scaffold1_123,A,N,N,N,N,G,N,N,N,N,N,N,A,N,A,A,N,G,G,N,N,G
scaffold1_140,C,N,N,N,N,C,N,N,N,N,N,N,C,N,C,C,N,T,C,N,N,C
scaffold2_161,G,N,N,N,N,G,N,C,N,N,C,N,G,N,G,G,N,G,G,C,N,G
scaffold2_162,C,N,N,N,N,C,N,T,N,N,T,N,C,N,C,C,N,C,C,T,N,C
scaffold2_180,C,N,N,N,N,C,N,T,N,N,C,C,C,T,C,C,T,C,C,C,N,C
scaffold2_194,C,N,N,C,N,C,C,C,C,C,C,C,C,C,T,C,C,C,C,C,N,C
scaffold3_195,G,N,N,G,G,C,G,G,G,G,G,G,C,G,C,G,G,C,C,G,N,C
scaffold3_234,T,N,A,T,A,A,T,T,T,A,T,A,A,T,A,A,T,A,A,T,N,A
scaffold101_282,C,T,T,T,C,C,T,C,T,C,C,C,C,T,C,C,T,C,C,C,N,C
scaffold101_371,T,T,T,T,T,C,T,T,T,T,T,T,T,T,T,T,T,T,T,T,N,C
scaffold101_372,T,T,T,T,C,C,T,T,T,T,T,T,T,T,T,T,T,T,T,T,N,C
Run Code Online (Sandbox Code Playgroud)
线条很独特。我希望特定于每个行的行scafold
放入一个单独的文件中,说所有scaffold1_
以命名的文件开头的行,scaffold1.txt
依此类推,直到scaffold10156.txt
包含以开头的行scaffold10156_
编辑:我想将 file1.txt 的第 1,2 列与 file2.txt 的第 1,3 列匹配并打印 file2.txt 的匹配行
文件1.txt:
scaffold1 57482
scaffold1 63114
scaffold1 63118
scaffold1 63129
scaffold1 63139
scaffold1 63279
scaffold1 63294
scaffold2 65015
scaffold2 77268
scaffold2 77335
Run Code Online (Sandbox Code Playgroud)
文件2.txt:
scaffold1 381 382 T/A +
scaffold1 384 385 T/A,G +
scaffold1 385 386 G/C +
scaffold1 445 446 C/T +
scaffold1 57481 57482 T/A +
scaffold1 63113 63114 T/A,G +
scaffold1 63128 63129 G/C +
scaffold2 65014 65015 G/A +
scaffold2 77267 77268 G/A +
scaffold2 77334 …
Run Code Online (Sandbox Code Playgroud)