相关疑难解决方法(0)

Python readlines()用法和有效的阅读练习

我有一个问题是在文件夹中解析1000个文本文件(每个文件大约3000行,大小约400KB).我确实用readlines读过它们,

   for filename in os.listdir (input_dir) :
       if filename.endswith(".gz"):
          f = gzip.open(file, 'rb')
       else:
          f = open(file, 'rb')

       file_content = f.readlines()
       f.close()
   len_file = len(file_content)
   while i < len_file:
       line = file_content[i].split(delimiter) 
       ... my logic ...  
       i += 1

Run Code Online (Sandbox Code Playgroud)

这对我输入的样本(50,100个文件)完全没问题.当我在整个输入上运行超过5K的文件时,所花费的时间远不及线性增量.我计划进行性能分析并进行Cprofile分析.当输入达到7K文件时,更多文件以指数方式增加并且达到更差的速率所花费的时间.

这是readlines的累计时间,第一个 - > 354个文件(来自输入的样本)和第二个 - > 7473个文件(整个输入)

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 354    0.192    0.001    **0.192**    0.001 {method 'readlines' of 'file' objects}
 7473 1329.380    0.178  **1329.380**    0.178 {method 'readlines' of 'file' objects}

Run Code Online (Sandbox Code Playgroud)

因此,我的代码所花费的时间不会随着输入的增加而线性缩放.我阅读了一些文档说明readlines(),其中人们声称这readlines()会将整个文件内容读入内存,因此与readline()或相比通常消耗更多内存 …

python memory performance python-2.6 readlines

Lea*_*ner

2016 08-06

38
推荐指数

2
解决办法

9万
查看次数