AEA*_*AEA 2 python memory buffer stringio python-2.7
我正在运行以下脚本,以便在文件存在的情况下通过循环数月和数年来相互追加文件,我刚刚用更大的数据集测试它,我希望输出文件的大小约为600mb.但是我遇到了内存问题.首先是这是正常的遇到内存问题(我的电脑有8 GB RAM)我不知道我是如何吃掉所有这些内存空间的?
import datetime, os
import StringIO
stored_data = StringIO.StringIO()
start_year = "2011"
start_month = "November"
first_run = False
current_month = datetime.date.today().replace(day=1)
possible_month = datetime.datetime.strptime('%s %s' % (start_month, start_year), '%B %Y').date()
while possible_month <= current_month:
csv_filename = possible_month.strftime('%B %Y') + ' MRG.csv'
if os.path.exists(csv_filename):
with open(csv_filename, 'rb') as current_csv:
if first_run != False:
next(current_csv)
else:
first_run = True
stored_data.writelines(current_csv)
possible_month = (possible_month + datetime.timedelta(days=31)).replace(day=1)
if stored_data:
contents = stored_data.getvalue()
with open('FullMergedData.csv', 'wb') as output_csv:
output_csv.write(contents)
Run Code Online (Sandbox Code Playgroud)
Traceback (most recent call last):
File "C:\code snippets\FullMerger.py", line 23, in <module>
contents = stored_output.getvalue()
File "C:\Python27\lib\StringIO.py", line 271, in getvalue
self.buf += ''.join(self.buflist)
MemoryError
Run Code Online (Sandbox Code Playgroud)
任何想法如何实现解决方案或使此代码更有效地克服此问题.非常感谢
AEA
在运行提供的代码alKid后,我收到了以下回溯.
Traceback (most recent call last):
File "C:\FullMerger.py", line 22, in <module>
output_csv.writeline(line)
AttributeError: 'file' object has no attribute 'writeline'
Run Code Online (Sandbox Code Playgroud)
我修改了上面的内容,writelines但是我仍然收到了以下追溯.
Traceback (most recent call last):
File "C:\FullMerger.py", line 19, in <module>
next(current_csv)
StopIteration
Run Code Online (Sandbox Code Playgroud)
在stored_data,你试图存储整个文件,因为它太大,你得到你正在显示的错误.
一种解决方案是每行写入文件.它的内存效率更高,因为您只在缓冲区中存储一行数据,而不是整个600 MB.
简而言之,结构可以是这样的:
with open('FullMergedData.csv', 'a') as output_csv: #this will append
# the result to the file.
with open(csv_filename, 'rb') as current_csv:
for line in current_csv: #loop through the lines
if first_run != False:
next(current_csv)
first_run = True #After the first line,
#you should immidiately change first_run to true.
output_csv.writelines(line) #write it per line
Run Code Online (Sandbox Code Playgroud)
应该解决你的问题.希望这可以帮助!
| 归档时间: |
|
| 查看次数: |
2183 次 |
| 最近记录: |