小编San*_*dra的帖子

如何让我的Python脚本更快？

我是Python的新手,我编写了一个(可能非常难看)脚本,它应该从fastq文件中随机选择一个序列子集.fastq文件以每行四行的块存储信息.每个块中的第一行以字符"@"开头.我用作输入文件的fastq文件是36 GB,包含大约14,000,000行.

我试图重写一个使用过多内存的现有脚本,并设法减少了很多内存使用量.但脚本需要永远运行,我不明白为什么.

parser = argparse.ArgumentParser()
parser.add_argument("infile", type = str, help = "The name of the fastq input file.", default = sys.stdin)
parser.add_argument("outputfile", type = str, help = "Name of the output file.")
parser.add_argument("-n", help="Number of sequences to sample", default=1)
args = parser.parse_args()


def sample():
    linesamples = []
    infile = open(args.infile, 'r')
    outputfile = open(args.outputfile, 'w')
    # count the number of fastq "chunks" in the input file:
    seqs = subprocess.check_output(["grep", "-c", "@", str(args.infile)])
    # randomly select n fastq "chunks":
    seqsamples …

Run Code Online (Sandbox Code Playgroud)

python performance bioinformatics fastq

San*_*dra

2015 03-21

7
推荐指数

1
解决办法

234
查看次数