我有2个数据框:
df1
SeqTech NMDS1 NMDS2 NMDS3 C1 C2 C3
AM.AD.1 Sanger -1.2408789 0.39893503 -0.036690753 -1.0330785 -0.009904179 -0.06261568
AM.AD.2 Sanger -0.9050894 0.55943858 -0.121985899 -1.0330785 -0.009904179 -0.06261568
AM.F10.T1 Sanger -0.9059108 0.09466239 -0.033827792 -1.0330785 -0.009904179 -0.06261568
AM.F10.T2 Sanger -0.8511172 0.21396548 -0.061612450 -1.0330785 -0.009904179 -0.06261568
DA.AD.1 Sanger -1.1390353 0.05166118 0.306245704 -1.0330785 -0.009904179 -0.06261568
DA.AD.1T Sanger -1.2072895 0.06963215 0.241758582 -1.0330785 -0.009904179 -0.06261568
DA.AD.2 Sanger -1.1279367 -0.18692443 -0.092967153 -1.0330785 -0.009904179 -0.06261568
DA.AD.3 Sanger -1.3517083 -0.03651835 0.008165075 -1.0330785 -0.009904179 -0.06261568
DA.AD.3T Sanger -1.2616186 -0.06099534 -0.016942073 -1.0330785 -0.009904179 …Run Code Online (Sandbox Code Playgroud) 我有一个基因组的 fasta 文件(txt),类似于:
$ cat Strain-01.faa
>IMEHDJCA_03186 Serine/threonine-protein phosphatase 2
MEFKHRFIDGSRYQRIFVIGDIHGKLALLQDTLKRVDFHGERDLLISVGDLIDRGPDSVG
VLDYYQTHDWFEAVMGNHEWMMVNALDAQNKLERSEKEAYFIKIWHRNGCEWSQNL
>IMEHDJCA_03187 Serine transporter
MKESRETLNFSDTLPTETWTKHDTHWVLSLFGTAVGAGILFLPINLGIGGFWPLVLLALL
AFPMTFWGHRALARFVLSSKQADADFTDVVEEHFGAKAGRLISLLYFLSIFPILLIYGVG
>IMEHDJCA_03189 hypothetical protein
MNNQRHGITFGIERIGSQTILVFKATGTLTHQDYQAIAPVLEAALAGINRQQMNMLADIS
EFSGWEPRAAWDDFQLGLKIGFSVNKVAVYGDKNWQELAAKVGSWFISGEMKSFGD
Run Code Online (Sandbox Code Playgroud)
我想添加一个基于 file.txt 中的列表的额外 ID。
$ cat file.txt
ID Gene Strain-01 Strain-02 Strain-03
ID_01 pphB IMEHDJCA_03186 DIBHEKPI_01648 LLMDBGDK_00598
ID_02 group_1001 IMEHDJCA_03187 DIBHEKPI_01635 LLMDBGDK_00611
ID_03 group_1002 IMEHDJCA_03189 DIBHEKPI_01628 LLMDBGDK_00616
Run Code Online (Sandbox Code Playgroud)
例如,对于 fastaStrain-01.faa文件,其IMEHDJCA_03186id 对应于Strain-01,因此我想将ID_01列 ID 的编号(来自file.txt)添加到序列的标题中,如下所示:
ID_01对应于IMEHDJCA_03186ID_02对应于IMEHDJCA_03187ID_03对应于IMEHDJCA_03189结果会是这样的:
$cat Strain-01_edited.faa
>ID_01 IMEHDJCA_03186 Serine/threonine-protein phosphatase …Run Code Online (Sandbox Code Playgroud)