用于模拟RNA合成的Perl程序

Koa*_*ala 3 perl hash bioinformatics

寻找关于如何处理我的Perl编程家庭作业编写RNA合成程序的建议.我总结并概述了下面的程序.具体来说,我正在寻找下面的块的反馈(我将编号以便于参考).我读过安德鲁约翰逊的"Perl编程元素"第6章(好书).我还阅读了perlfunc和perlop pod-pages,没有任何内容可以从哪里开始.

程序描述:程序应该从命令行读取输入文件,将其翻译成RNA,然后将RNA转录成一系列大写的单字母氨基酸名称.

  1. 接受命令行上指定的文件

    在这里我将使用<>运算符

  2. 检查以确保该文件仅包含acgt或die

    if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }  
    
    Run Code Online (Sandbox Code Playgroud)
  3. 将DNA转录为RNA(每个A替换为U,T替换为A,C替换为G,G替换为C)

    不知道该怎么做

  4. 从第一次出现的"AUG"开始,把它转录成3个字符的"密码子".

    不确定,但我认为这是我将开始%哈希变量的地方?

  5. 取3个字符"密码子"并给它们一个字母符号(一个大写的单字母氨基酸名称)

    使用键赋值(这里有70种可能性,所以我不确定存储位置或访问方式)

  6. 如果遇到间隙,则启动新行并重复处理

    不确定,但我们可以假设差距是三倍的倍数.

  7. 我是以正确的方式接近这个吗?是否有我可以忽略的Perl功能可以简化主程序?

注意

必须是自包含程序(密码子名称和符号的存储值).

每当程序读取没有符号的密码子时,这是RNA中的缺口,它应该开始一个新的输出系列并从下一次出现的"AUG"开始.为简单起见,我们可以假设间隙总是三倍的倍数.

在我花费任何额外时间进行研究之前,我希望得到确认,我正在采取正确的方法.感谢您花时间阅读并分享您的专业知识!

Ped*_*lva 5

1. here I will use the <> operator

好的,你的计划是逐行读取文件.不要忘记chomp每行,或者你的序列中最后会出现换行符.


2. Check to make sure the file only contains acgt or die

if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }

在while循环中,<>操作符将读取的行放入特殊变量中$_,除非您明确指定它(my $line = <>).

在上面的代码中,您正在从文件中读取一行并将其丢弃.你需要保存那条线.

此外,ne运算符比较两个字符串,而不是一个字符串和一个正则表达式.你需要!~这里的操作符(或者=~一个带有否定字符类的操作符[^acgt].如果你需要测试不区分大小写,请查看i正则表达式匹配的标志.


3. Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C).

正如GWW所说,检查你的生物学.T-> U是转录中的唯一步骤.你会发现tr(音译)操作符在这里很有帮助.


4. Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"

not sure but I'm thinking this is where I will start a %hash variables?

我会在这里使用一个缓冲区.在while(<>)循环外定义标量.使用index匹配"AUG".如果你没找到它,把最后两个基地放在那个标量上(你可以用substr $line, -2, 2它).在循环的下一次迭代中,.=将该行追加到那两个碱基,然后再次测试"AUG".如果你受到了打击,你会知道在哪里,所以你可以标记这个位置并开始翻译.


5. Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)

Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)

再次,正如GWW所说,构建一个哈希表:

%codons = ( AUG => 'M', ...).

然后你可以使用(例如.)split来构建你正在检查的当前行的数组,一次构建三个元素的密码子,并从哈希表中获取正确的氨基酸代码.


6.If a gap is encountered a new line is started and process is repeated

not sure but we can assume that gaps are multiples of threes.

往上看.你可以测试是否存在差距exists $codons{$current_codon}.


7. Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?

你知道,看看上面的内容,似乎太复杂了.我建了几个积木; 子程序read_codontranslate:我认为它们极大地帮助了程序的逻辑.

我知道这是一项家庭作业,但我认为这可能有助于您了解其他可能的方法:

use warnings; use strict;
use feature 'state';


# read_codon works by using the new [state][1] feature in Perl 5.10
# both @buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, @buffer always holds whatever was left
# from the previous call.
# The base case is that @buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the @buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the @buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the @buffer, join them into a string,
# transcribe it and return it.

sub read_codon {
    my ($file) = @_;

    state @buffer;
    open state $handle, '<', $file or die $!;

    if (@buffer < 3) {
        my $new_line = scalar <$handle> or return;
        chomp $new_line;
        push @buffer, split //, $new_line;
    }

    return transcribe(
                       join '', 
                       shift @buffer,
                       shift @buffer,
                       shift @buffer
                     );
}

sub transcribe {
    my ($codon) = @_;
    $codon =~ tr/T/U/;
    return $codon;
}


# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it, 
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub

{
    our %codes = (
        AUG => 'M',
        ...
    );

    sub translate {
        my ($codon) = @_ or return;

        state $TRANSLATE = 0;

        $TRANSLATE = 1 if $codon =~ m/AUG/i;
        $TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;

        return $codes{$codon} if $TRANSLATE;
    }
}
Run Code Online (Sandbox Code Playgroud)