如何在c#中获取肽的所有dna编码

kob*_*osh 5 c# bioinformatics

嗨,我的头现在沸腾了3天!我想获得肽的所有DNA编码:肽是氨基酸序列,即氨基酸M和氨基酸Q可以形成肽MQQM

DNA编码意味着每个氨基酸都有一个DNA代码(称为密码子)(对于某些代码,有一个以上的代码,即氨基酸T有4个不同的代码/密码子)

以下代码中的最后一个函数不起作用,所以我想要一个让它适合我,请不要查询集成语言(我忘了它的首字母缩略词!)

private  string[] CODONS ={ 
    "TTT", "TTC", "TTA", "TTG", "TCT",
    "TCC", "TCA", "TCG", "TAT", "TAC", "TGT", "TGC", "TGG", "CTT",
    "CTC", "CTA", "CTG", "CCT", "CCC", "CCA", "CCG", "CAT", "CAC",
    "CAA", "CAG", "CGT", "CGC", "CGA", "CGG", "ATT", "ATC", "ATA",
    "ATG", "ACT", "ACC", "ACA", "ACG", "AAT", "AAC", "AAA", "AAG",
    "AGT", "AGC", "AGA", "AGG", "GTT", "GTC", "GTA", "GTG", "GCT",
    "GCC", "GCA", "GCG", "GAT", "GAC", "GAA", "GAG", "GGT", "GGC",
    "GGA", "GGG", };

private  string[] AMINOS_PER_CODON = { 
    "F", "F", "L", "L", "S", "S",
    "S", "S", "Y", "Y", "C", "C", "W", "L", "L", "L", "L", "P", "P",
    "P", "P", "H", "H", "Q", "Q", "R", "R", "R", "R", "I", "I", "I",
    "M", "T", "T", "T", "T", "N", "N", "K", "K", "S", "S", "R", "R",
    "V", "V", "V", "V", "A", "A", "A", "A", "D", "D", "E", "E", "G",
    "G", "G", "G", };


public  string codonToAminoAcid(String codon)
{
    for (int k = 0; k < CODONS.Length; k++)
    {
        if (CODONS[k].Equals(codon))
        {
            return AMINOS_PER_CODON[k];
        }
    }

    // never reach here with valid codon
    return "X";
}

public  string AminoAcidToCodon(String aminoAcid)
{
    for (int k = 0; k < AMINOS_PER_CODON .Length; k++)
    {
        if (AMINOS_PER_CODON [k].Equals(aminoAcid ))
        {
            return CODONS[k];
        }
    }

    // never reach here with valid codon
    return "X";
}

public string GetCodonsforPeptide(string pep)
{
    string result = ""; 
    for (int i = 0; i <pep.Length ; i++)
    {
        result = AminoAcidToCodon(pep.Substring (i,1) );
        for (int q = 0; q < pep.Length; q++)
        {
            result += AminoAcidToCodon(pep.Substring(q, 1));
        }
    }

    return result;
}
Run Code Online (Sandbox Code Playgroud)

ang*_*son 2

尝试使用以下两种方法:

public IEnumerable<string> AminoAcidToCodon(char aminoAcid)
{
    for (int k = 0; k < AMINOS_PER_CODON.Length; k++)
    {
        if (AMINOS_PER_CODON[k] == aminoAcid)
        {
            yield return CODONS[k];
        }
    }
}

public IEnumerable<string> GetCodonsforPeptide(string pep)
{
    if (string.IsNullOrEmpty(pep))
    {
        yield return string.Empty;
        yield break;
    }

    foreach (var codon in AminoAcidToCodon(pep[0]))
        foreach (var codonOfRest in GetCodonsforPeptide(pep.Substring(1)))
            yield return codon + codonOfRest;
}
Run Code Online (Sandbox Code Playgroud)

笔记:

  • 由于每个氨基酸都会有多个匹配密码子,因此当找到第一个密码子时返回的方法只会与每个氨基酸匹配一次。相反,我创建了一个枚举器方法来yield return匹配每个密码子。
  • 最后一种方法找到肽的第一个字符的所有匹配密码子,并将每个这样的密码子与第一个字符之后的肽的其余部分组成的所有密码子组合。
  • 我将AMINOS_PER_CODON数组用作char类型。如果需要,您可以轻松更改代码以使用字符串数组。
  • 没有两个单独数组的更好方法是创建一个字典,将每个氨基酸字符映射到密码子字符串列表。

传入时的示例输出"MA"

ATGGCT 
ATGGCC 
ATGGCA 
ATGGCG 
Run Code Online (Sandbox Code Playgroud)

这是因为M映射到这些:

ATG
Run Code Online (Sandbox Code Playgroud)

A映射到这些:

GCT 
GCC 
GCA 
GCG
Run Code Online (Sandbox Code Playgroud)

我建议你使用的字典如下所示:

var codonsByAminoAcid = new Dictionary<char, string[]>
{
    { 'M', new[] { "ATG" } },
    { 'A', new[] { "GCT", "GCC", "GCA", "GCG" } }
};
Run Code Online (Sandbox Code Playgroud)

这将取代该AminoAcidToCodon方法。

您甚至可以从两个数组构建该字典:

var lookup = 
    CODONS
    .Zip(AMINOS_PER_CODON, (codon, amino) => new { codon, amino })
    .GroupBy(entry => entry.amino)
    .ToDictionary(
        g => g.Key,
        g => g.Select(ge => ge.codon).ToArray());
Run Code Online (Sandbox Code Playgroud)

GetCodonsforPeptide方法可能如下所示:

public IEnumerable<string> GetCodonsforPeptide(string pep)
{
    if (string.IsNullOrEmpty(pep))
    {
        yield return string.Empty;
        yield break;
    }

    foreach (var codon in lookup(pep[0]))
        foreach (var codonOfRest in GetCodonsforPeptide(pep.Substring(1)))
            yield return codon + codonOfRest;
}
Run Code Online (Sandbox Code Playgroud)

IE。通过查找表替换对该其他方法的调用。