我认为这将是一项相对容易完成的任务,但我在这里找不到不专注于根据列条件总结行的示例。我想要实现的是总结列重复,但保持行唯一。
这就是我的意思:
MKC100.1 MKC100.2 MKC100.3 MKC103.1 MKC103.2 MKC103.3 MKC104.2 MKC104.3
299fc0ac11fb4afd0da849a2c45583b3 0 0 0 0 0 0 0 1
9bc2bacdfadf4c1352ffbc991803287c 1183 1666 1318 0 0 0 10 20
38b782d9f01c69c3570fe0edd5864dc0 493 626 543 10 0 0 5 5
6d078397349f7d39c34d237a6ef4cb75 43735 51511 46876 0 0 0 1 0
c22e752b441ee4190f27a3690c5d1206 0 0 0 2795 1128 1956 1 1
f6513affb198fb9845741b61ece8db4b 59 58 82 0 0 0 0 0
structure(list(MKC100.1 = c(0L, 1183L, 493L, 43735L, 0L, 59L),
MKC100.2 = c(0L, 1666L, 626L, 51511L, 0L, 58L), …Run Code Online (Sandbox Code Playgroud) 我曾经有一个 awk 命令,它可以很好地在输出文件中的最后一项(第 10 个逗号之后)周围加上引号,这样当我将它们作为 CSV 文件打开时,最后一项就不会因为它的原因而被分割。额外的逗号。然而,由于某种原因, awk 命令被破坏了(我从来没有想到有人帮助我创建它)并且它返回一个包含许多空行或已删除数据的文件。
这是我的初始输出文件的示例:
e7479580f6f3be15b5632f64f9de8df7,gi|1858620278|gb|MN628024.1|,132,541,100,132,100.000,2.02e-60,244,82755,Gymnanthemum amygdalinum voucher PCG/UNN/030-52 ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds; chloroplast
e7479580f6f3be15b5632f64f9de8df7,gi|1858620278|gb|MN628024.1|,132,541,100,132,100.000,2.02e-60,244,82755,Gymnanthemum amygdalinum voucher PCG/UNN/030-52 ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds; chloroplast
b875a20e3a4876aba15b0edf8973a3f4,gi|1832942633|gb|MN431198.1|,132,573,100,132,100.000,2.02e-60,244,39414,Plantago lanceolata ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcl) gene, partial cds; chloroplast
023abf2ebf1c94fe890dfd1517a828c5,gi|1562068410|gb|MH569150.1|,132,715,98,129,100.000,9.41e-59,239,2508311,Brassica sp. 4 KS-2019 ribulose-1,5-bisphosphate carboxylase/oxygenase (rbcL) pseudogene, partial sequence; mitochondrial
Run Code Online (Sandbox Code Playgroud)
这就是我想要的输出文件的样子 - 它只是在最后一项周围有引号,即物种的全名及其测序信息。
e7479580f6f3be15b5632f64f9de8df7,gi|1858620278|gb|MN628024.1|,132,541,100,132,100.000,2.02e-60,244,82755,"Gymnanthemum amygdalinum voucher PCG/UNN/030-52 ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds; chloroplast"
e7479580f6f3be15b5632f64f9de8df7,gi|1858620278|gb|MN628024.1|,132,541,100,132,100.000,2.02e-60,244,82755,"Gymnanthemum amygdalinum voucher PCG/UNN/030-52 ribulose-1,5-bisphosphate carboxylase/oxygenase …Run Code Online (Sandbox Code Playgroud)