use*_*807 2 python dataframe pandas
我正在尝试读取包含 id 和序列的核苷酸序列文件。默认情况下,序列在 70 位核苷酸序列后由新行分隔。
输入文件 (seq.txt) 看起来像这样。
seqgb_AY741213_Organism_Influenza_A_virus__A_blackbird_Hunan_1_2004_H5N1___Strain_Name_A_blackbird_Hunan_1_2004_Segment_4_Subtype_H5N1_Host_Blackbird,
ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC
ATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTACACATGCTCAAGA
CGTACTGGACAAGACACACAACGGGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA
AGGAGAATAGAAAATTTAAACAAGAAGATGGAGGACGGATTCCTAGATGTCTGGACTTATAATGCTGAAC
TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTCAAGAACCTTTACGAAAA
GGTCCGACTACAACTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT
GATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTATGACTACCCGCAGTATTCAGAAGAAGCAAGAC
TAAACAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTC
AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTATCTTTATGGATGTGCTCCAATGGA
TCGTTACAATGCAGAATTTGCATTTGA
seqgb_EU676325_Organism_Influenza_A_virus__A_brown-head_gull_Thailand_vsmu-4_2008_H5N1___Strain_Name_A_brown-head_gull_Thailand_vsmu-4_2008_Segment_4_Subtype_H5N1_Host_Brown-Headed_Gull,
TTTAGCAAAAGGCAGGGGTATATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTT
GTTAAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGG
AAAAGAACGTTACTGTTACACATGCCCAAGACATACTGGAAAAGACACACAACGGGAAGCTCTGCGATCT
AGATGGAGTGAAGCCTCTAATTTTGAGAGATTGTAGTGTAGCTGGATGGCTCCTCGGAAACCCAATGTGT
GACGAATCTCCAATGGGGGCGATAAACTCTAGTATGCCATTCCACAATATACACCCTCTCACCATCGGGG
AATGCCCCAAATATGTGAAATCAAACAGATTAGTCCTTGCGACTGGGCTCAGAAATAGCCCTCAAAGAGA
GAGAAGAAGAAAAAAGAGAGGATTATTTGGAGCTATAGCAGGTTTTATAGAGGGAGGATGGCAGGGAATG
GTAGATGGTTGGTATGGGTACCACCATGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTC
ATGACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGG
TAACGGTTGTTTCGAGTTCTATCATAAATGTGATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTAT
GACTACCCACAGTATTCAGAAGAAGCAAGACTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAA
TAGGAATTTACCAAATACTGTCAATTTATTCTACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGC
TGGTCTATCCTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTC
AGATTGAG
seqgb_EF178528_Organism_Influenza_A_virus__A_brown-headed_gull_Thailand_VSMU-28-SPK_2005_H5N1___Strain_Name_A_brown-headed_gull_Thailand_VSMU-28-SPK_2005_Segment_4_Subtype_H5N1_Host_Brown-Headed_Gull,
AGCAAAAGCAGGGGTATAATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTT
AAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAA
AGAACGTTACGAATGATGCAATCAACTTCGAGAGTAATGGAAATTTCATTGCTCCAGAGTATGCATACAA
AATTGTCAAGAAAGGGGACTCAACAATTATGAAAAGTGAATTGGAATATGGTAACTGCAACACCAAGTGT
CAAACTCCAATGGGGGCGATAAACTCAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGC
CGTTGGAAGGGAATTTAACAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTC
CTAGATGTCTGGACTTATAATGCTGAACTTCTGGTTCTCCTGGAAAATGAGAGAACTCTAGACTTTCATG
ACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAA
CGGTTGTTTCGAGTTCTATCATAAATGTGATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTATGAC
TACCCACAGTATTCAGAAGAAGCAAGACTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAG
GAATTTACCAAATACTGTCAATTTATTCTACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGG
TCTATCCTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGA
T
seqgb_CY091790_Organism_Influenza_A_virus__A_chicken_Ampenan_BBVD-282_2007_H5N1___Strain_Name_A_chicken_Ampenan_BBVD-282_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATT
TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA
CACATGCCCAAGACATACTGGAAAAGGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGA
ACCTCTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATAATGAATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATAC
TGTCGATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT
GTGCTCCAATGGATCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGATTGTAGTTAAA
seqgb_KT216634_Organism_Influenza_A_virus__A_chicken_Anhui_MG08_2008_H9N2___Strain_Name_A_chicken_Anhui_MG08_2008_Segment_4_Subtype_H9N2_Host_Chicken,
AGCAAAAGCAGGGGAATTTCACAACCACTCAAGATGGAGACAGTATCACTAATAAATATACTACTAGTAG
TAACAGTAAGCAATGCAGATAAAATCTGCATCGGCTATCAATCAACAAATTCCACAGAAACTGTAGACAC
ACTAACAGAAAACAATGTCCCTGTGATTGTAATTGCAATGGGGTTTGCTGCCTTCTTGTTCTGGGCCATG
TCCAATGGGTCTTGCAGATGCAACATTTGTATATAATTGGCAAAAACACCCTTGTTTCTACT
seqgb_KY005855_Organism_Influenza_A_virus__A_chicken_Anhui_MZ33_2016_H5N6___Strain_Name_A_chicken_Anhui_MZ33_2016_Segment_4_Subtype_H5N6_Host_Chicken,
ATGGAGAAAATAGTGCTTCTTCTTGCAGTGGTTAGCCTTGTTAAAGGTGATCAGATTTGCATTGGTTACC
ATGCAAACAACTCGACTGAGCAGGTTGACACGATAATGGAAAAAAACGTCACTGTTACACATGCTCAAGA
CATACTAGAAAGGAATATGGCAATTGCAACACCAAATGTCAAACTCCAATAGGGGCGATAAACTCTAGTA
TGCCATTCCACAATATACACCCTCTCACTATCGGGGAGTGCCCCAAATATGTGAAATCAAACAAATTAGT
CCTTGCGACTGGGCTCAGAAATAGTCGAATCCACCCAAAAGGCAATAGATGGAGTTACCAATAAGGTCAA
CTCGATAATTGACAAAATGAACACTCAGACGGATTCCTAGATGTCTGGACTTATAATGCTGAACTTTTAG
TTCTCATGGAAAATGAGAGAACTCTAGATTTCCATGACTCAAATGTCAAGAACCTTTATGACAAAGTCCG
ACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAATGGTTGTTTCGAGTTCTATCACAAATGTGATAAT
GAATGTATGGAAAGTGTGAGGAATGGGACGTATGACTACCCCCAGTATTCAGAAGAAGCAAGATTAAAAA
GGGAAGAAATAAGCGGAGTGAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTCAACAGT
GGCGGGTTCCCTAGCACTGGCAATCATTGTGGCTGGTCTATCTTTATGGATGTGCTCCAATGGGTCGTTA
CAATGCAGAATTTGCATTTAA
seqgb_KY005863_Organism_Influenza_A_virus__A_chicken_Anhui_MZ34_2016_H5N6___Strain_Name_A_chicken_Anhui_MZ34_2016_Segment_4_Subtype_H5N6_Host_Chicken,
ATGGAGAAAAGAAGAACGATGCATACCCAACAATAAAAATGAGCTACAATAACACCAATAGGGAAGATCT
TTTGATACTGTGGGGGATTCATCATTCCAATAATGCAGAAGAGCAGACAAATCTCTATAAAAACCCAACC
ACCTATGTTTCCGTTGGGACATCAACATTAAACCAGAGAGTGGTGCCAAAAATAGCTACTAGATCCCAAG
TAAACGGGCAAAGTGGAAGAATGGATTTCTTCTGGACAATTTTAAAACCGGATGATGCAATCCACTTCGA
GAGTAATGGAAATTTTATTGCTCCAGACTATCGGGGAGTGCCCCAAATATGTGAAATCAAACAAATTAGT
CCTTGCGACTGGGCTCAGAAATAGTCCTCTAAGAGAAAGAAGAAGAAAAAGAGGATTATTTGGAGCCATA
GCAGGGTTTATAGAGGGAGGATGGCAAGGAATGGTAGATGGTTGGTATGGGTACCACCATAGCAATGCAC
AAGGGAGTGGGTATGCTGCAGACAGAGAATCCACCCAAAAGGCAATAGATGGAGTTACCAATAAGGTCAA
CTCGATAATTGACAAAATGAACACTCAATTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAACGGAGA
ATAGAGAATTTAAATAAGAAAATGGAAGACGGATTCCTAGATGTCTGGACTTATAATGCTGAACTTTTAG
TTCTCATGGAAAATGAGAGAACTCTAGATTTCCATGACTCAAATGTCAAGAACCTTTATGACAAAGTCCG
ACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAATGGTTGTTTCGAGTTCTATCACAAATGTGATAAT
GAATGTATGGAAAGTGTGAGGAATGGGACGTATGACTACCCCCAGTATTCAGAAGAAGCAAGATTAAAAA
GGGAAGAAATAAGCGGAGTGAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTCAACAGT
GGCGGGTTCCCTAGCACTGGCAATCATTGTGGCTGGTCTATCTTTATGGATGTGCTCCAATGGGTCGTTA
CAATGCAGAATTTGCATTTAA
seqgb_CY091815_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-277_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-277_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATT
TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA
CACATGCCCAAGACATACTGGAAAAGACACACAACGGGAAGCTCTGTGATCTAGATGGAGTGAAGCCTCT
AATTTTAAGAGATTGTAGTGTAGCTGGATGGCTCCTCGGGAACCCAATGTGTGATGAATTCATCAATGTA
CCGGAATGGTCTTACATAGTGGAGAACAGGGGTGAGCTCAGCATGTCCATACCTGGGAACGCCCTCCTTT
TTTAGAAATGTGGTATGGCTTATCAAAAAGAACAGTACATACCCAACAATAAAAAGAAGCTACAATAATA
CCAACCAAGAAGATCTTTTGGTACTGTGGGGGATTCACCATCCTAATGATGCGGCAGAGCAAACGAGGCT
ATATCAAAATCCAATCACCTATATTTCCGTTGGGACATCAACACTGAACCAGAGATTGGTACCAAAAATA
GCTACCAGAACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGGGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTAGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT
GTGCTCCAATGGATCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_CY091816_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-288_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-288_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCCGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT
TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA
CACATGCCCAAGACATACTGGAAAAGGCACACAACGGGAAGCTCTGTGATCTAGATGGAGTGAAGCCTCT
AATTTTAAGAGATTGTAGTGTAGCCGGATGGCTCCTCGGGAACCCAATGTGTGACGAATTCATCAATGTA
CCGGAATGGTCTTACATAGTGGAGAACAGGGGTGAGCTCAGCATGTCCATACCTGGGAACGCCCTCCTTT
TTTAGAAATGTGGTATGGCTTATCAAAAAGAACAGTACATACCCAACAATAAAAAGAAGCTACAATAATA
CCAACCAGGAAGATCTTTTGGTACTGTGGGGGATTCACCATCCTAATGATGCGGCTGAGCAAACGAAGCT
ATATCAAAATCCAACCACCTATATTTCCGTTGGGACATCAACACTAAATCAGAGATTGGTACCAAAAATA
GCTACTAGATCCAAAGTAAACGGACAAAGTGGAAGGATGGAGTTCTTCTGGACAATTTTAAAACCCAATG
ATGCAATCAACTTCGAGAGTAATGGAAATTTCATTGCTCCAGAATATGCCTACAAAATTGTCAAGAAAGG
GGACTCAGCAATTATGAAAAGTGAATTGGAATATGGCAACTGCAACACCAAATGTCAAACTCCAATGGGG
GCGATAAACTTGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT
GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_CY091819_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-328_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-328_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT
TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA
CACATGCCCAAGACATACTAGAAAAGGCACACAACGGGAAGCTCTGTGATCTAGATGGAGTGAAGCCTCT
AATTTTAAGAGATTGTAGTGTAGCCGAGCAGAATAAACCATTTTGAGAAAATTCAGATCATCCCCAAAAG
TTCTTGGTCCGACCATGAAGCCTCGTCAGGGGTGAGCTCAGCATGTCCATACCTGGGAACGCCCTCCTTT
TTTAGAAATGTGGTATGGCTTATCAAAAAGAACAGTACATACCCAACAATAAAAAGAAGCTACAATAATA
CCAACCAGGAAGATCTTTTGGTACTGTGGGGGATCCACCATCCTAATGATGCGGCTGAGCAAACGAAGCT
ATATCAAAATCCAACCACCTATATTTCCGTTGGGACATCAACACTAAATCAGAGATTGGTACCAAAAATA
GCTACTAGATCCAAAGTAAACGGACAAAGTGGAAGGATGGAGTTCTTCTGGACAATTTTAAAACCCAATG
ATGCAATCAACTTCGAGAGTAATGGAAATTTCATTGCTCCAGAATATGCCTACAAAATTGTCAAGAAAGG
GGACTCAGCAATTATGAAAAGTGAATTGGAATATGGCAACTGCAACACCAAATGTCAAACTCCAATGGGG
GCGATAAACTCTAGTATGCCATTCCACAACATACACCCTCTCACCATCGGGGAATGCCCCAAATATGTGA
AATCAAACAGATTAGTCCTTGCGACTGGGCTCAGAAATAGCCCCCAAAGAGAGAGAAGAAGAAAAAAGAG
AGGACTATTTGGAGCTATAGCAGGTTTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG
TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG
GAGTCACCAATAAGGTCAATTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT
TAATAACTTAGAAAGGAGAATAGAGACTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAGATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATTTTTATGGAT
GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_CY091820_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-342_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-342_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCCGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT
TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA
CACATGCCCAAGACATACTGGAAAAGGCACACAACGGGAAGCTCTGTGATCTAGATGGGGTGAAGCCTCT
AATTTTAAGAGATTGTAGTGTAGCCGTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG
TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG
GAGTCACCAATAAGGTCAACTCGATTATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT
TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT
TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA
ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT
GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_GQ122391_Organism_Influenza_A_virus__A_chicken_Bali_UT2091_2005_H5N1___Strain_Name_A_chicken_Bali_UT2091_2005_Segment_4_Subtype_H5N1_Host_Chicken,
ATGGAGAAAATAGTGCTTCTTCTTGCAACAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC
ATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTACACATGCCCAAGA
CATACTGGAAAAAACACACAACGGGAATGGCAGGGAATGGTAGATGGTTGGTATGGGTACCACCATAGCA
ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA
GGTCAACTCAATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA
AGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTTCTAGATGTCTGGACTTATAATGCCGAAC
TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGAACCTCTACGACAA
GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT
GATAATGAATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCAGAAGAAGCAAGAT
TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTC
AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGATGTGCTCCAATGGA
TCGTTACAATGCAGAATTTGCATTTAA
seqgb_GQ122392_Organism_Influenza_A_virus__A_chicken_Bali_UT2092_2005_H5N1___Strain_Name_A_chicken_Bali_UT2092_2005_Segment_4_Subtype_H5N1_Host_Chicken,
ATGGAGAAAATAGTGCTTCTTCTTGCAACAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC
ATGCAAACAATTCAACAGAGCAGGTTGCCCTCAAAGAGAGAGAAGAAGAAAAAAGAGAGGACTATTTGGA
GCTATAGCAGGTTTTATAGAGGGAGGATGGCAGGGAATGGTAGATGGTTGGTATGGGTATCACCATAGCA
ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA
GGTCAACTCAATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA
AGGAGAATAGAATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGAACCTCTACGACAA
GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT
GATAATGAATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCAGAAGAAGCAAGAT
TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTC
AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGATGTGCTCCAATGGA
TCGTTACAATGCAGAATTTGCATTTAA
seqgb_DQ083551_Organism_Influenza_A_virus__A_chicken_Bangkok_Thailand_CU-3_04_H5N1___Strain_Name_A_chicken_Bangkok_Thailand_CU-3_04_Segment_4_Subtype_H5N1_Host_Chicken,
ATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC
ATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTACACATGCCCAAGA
CATACTGGAAAAGACTTTCATTGCTCCAGAATATGCATACAAAATTGTCAAGAAAGGGGACTCAACAATT
ATGAAAAGTGAATTGGAATATGGTAAATGGCAGGGAATGGTAGATGGTTGGTATGGGTACCACCATAGCA
ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA
GGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAACAACTTAGAA
AGGAGAATAGAAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCATAAATGT
GATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTATGACTACCCGCAGTATTCAGAAGAAGCAAGAC
TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAATTTACCAAATACTGTCAATTTATTC
TACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTATCCTTATGGATGTGCTCCAATGGG
TCGTTACAATGCAGAATTTGCATTTAAATTTG
seqgb_CY091797_Organism_Influenza_A_virus__A_chicken_Bangli_BBVD-245_2007_H5N1___Strain_Name_A_chicken_Bangli_BBVD-245_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT
TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA
CACATGCCCAATTAGTCCTTGCGACTATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT
TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT
TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA
ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGTTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT
GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_CY091801_Organism_Influenza_A_virus__A_chicken_Bangli_BBVD-562_2007_H5N1___Strain_Name_A_chicken_Bangli_BBVD-562_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCTGTCATTCGAGAGTAATGGAGGGCTCAGAAATAGCCCCCAAAGAGAGAGAAGAAGAAAAAAGAG
AGGACTATTTGGAGCTATAGCAGGTTTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG
TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAAATG
GAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT
TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT
TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA
ACCTCTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAGATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATTTTTATGGAT
GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_CY091803_Organism_Influenza_A_virus__A_chicken_Bangli_BBVD-575_2007_H5N1___Strain_Name_A_chicken_Bangli_BBVD-575_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCCGTCAGAGCTATAGCAGGTTTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG
TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG
GAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT
TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCTTAGATGTCTGGACT
TATAATGCTGAGCTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA
ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT
GTGCTCCAATGGATCATTACAGTGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_GQ122399_Organism_Influenza_A_virus__A_chicken_Banten_UT6025_2006_H5N1___Strain_Name_A_chicken_Banten_UT6025_2006_Segment_4_Subtype_H5N1_Host_Chicken,
ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC
ATGCAAACAATCAGGGCTCAGAAAGGATGGCAGGGAATGGTAGATGGTTGGTATGGGTACCATCATAGCA
ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA
GGTCAACTCAATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA
AGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTTCTAGATGTCTGGACTTATAATGCCGAAC
TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGAACCTCTATGACAA
GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT
GATAATGGATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCAGAAGAAGCAAGAT
TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTATCAAATACTGTCAATTTATTC
AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGATGTGTTCCAATGGA
TCGTTACAATGCAGAATTTGCATTTAA
seqgb_CY091789_Organism_Influenza_A_virus__A_chicken_Buleleng_BBVD-545b_2007_H5N1___Strain_Name_A_chicken_Buleleng_BBVD-545b_2007_Segment_4_Subtype_H5N1_Host_Chicken,
TCAATCCGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT
TGCATTGGTTACCATGAAAAGTGAATTGGAATATGGCAACTGCAACACCAAATGTCAAACTCCAATGGGG
GCGATAAACTCTAGTATGCCATTCCATGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG
GAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT
TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT
TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGA
ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT
CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA
GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC
TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT
GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA
seqgb_HQ200590_Organism_Influenza_A_virus__A_chicken_Cambodia_047LC3_2005_H5N1___Strain_Name_A_chicken_Cambodia_047LC3_2005_Segment_4_Subtype_H5N1_Host_Chicken,
AGCAAAAGCAGGGGTTTAATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCGATAGTCAGTCTTGTT
AAAAGTGATCAGATGGGACTCAACAATTATGAAAAGTGAATTGGAATATGGTAACTGCAACACCAAGTGT
CAAACTCCAATGGGGGCGATAAACTCCAATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTC
AAAAGGCTATAGATGGAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGC
CGTTGGAAGGGAATTTAACAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTC
CTAGATGTCTGGACTTATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTCCATG
ACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAA
CGGTTGTTTCGAGTTCTATCACAAATGTGATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGAC
TACCCGCAGTATTCAGAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAG
GAATTTACCAAATACTGTCAATTTATTCTACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGG
TCTATCCTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGA
TTGTAGTTAAAAACACCCTTGTTTCTACT
seqgb_HQ200554_Organism_Influenza_A_virus__A_chicken_Cambodi
几乎根据定义,换行符是 CSV 文件的关键部分,因此无法让 Pandasread_csv忽略它们。最好是手动删除换行符,如下所示:
import pandas as pd
import re
with open ("seq.txt", "r") as myfile:
data=myfile.readlines()
data = re.sub('\n', '', ''.join(data))
data = data.split(',')
df = pd.DataFrame([data], names=["id", "seq"])
Run Code Online (Sandbox Code Playgroud)