如何在顺序将文件相互附加时克服内存问题

AEA*_*AEA 2 python memory buffer stringio python-2.7

我正在运行以下脚本,以便在文件存在的情况下通过循环数月和数年来相互追加文件,我刚刚用更大的数据集测试它,我希望输出文件的大小约为600mb.但是我遇到了内存问题.首先是这是正常的遇到内存问题(我的电脑有8 GB RAM)我不知道我是如何吃掉所有这些内存空间的?

代码我正在运行

import datetime,  os
import StringIO

stored_data = StringIO.StringIO()

start_year = "2011"
start_month = "November"
first_run = False

current_month = datetime.date.today().replace(day=1)
possible_month = datetime.datetime.strptime('%s %s' % (start_month, start_year), '%B %Y').date()
while possible_month <= current_month:
    csv_filename = possible_month.strftime('%B %Y') + ' MRG.csv'
    if os.path.exists(csv_filename):
        with open(csv_filename, 'rb') as current_csv:
            if first_run != False:
                next(current_csv)
            else:
                first_run = True
            stored_data.writelines(current_csv)
    possible_month = (possible_month + datetime.timedelta(days=31)).replace(day=1)
if stored_data:
    contents = stored_data.getvalue()
    with open('FullMergedData.csv', 'wb') as output_csv:
        output_csv.write(contents)
Run Code Online (Sandbox Code Playgroud)

我收到的引用:

Traceback (most recent call last):
  File "C:\code snippets\FullMerger.py", line 23, in <module>
    contents = stored_output.getvalue()
  File "C:\Python27\lib\StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
MemoryError
Run Code Online (Sandbox Code Playgroud)

任何想法如何实现解决方案或使此代码更有效地克服此问题.非常感谢
AEA

EDIT1

在运行提供的代码alKid后,我收到了以下回溯.

Traceback (most recent call last):
  File "C:\FullMerger.py", line 22, in <module>
    output_csv.writeline(line)
AttributeError: 'file' object has no attribute 'writeline'
Run Code Online (Sandbox Code Playgroud)

我修改了上面的内容,writelines但是我仍然收到了以下追溯.

Traceback (most recent call last):
  File "C:\FullMerger.py", line 19, in <module>
    next(current_csv)
StopIteration
Run Code Online (Sandbox Code Playgroud)

aIK*_*Kid 5

stored_data,你试图存储整个文件,因为它太大,你得到你正在显示的错误.

一种解决方案是每行写入文件.它的内存效率更高,因为您只在缓冲区中存储一行数据,而不是整个600 MB.

简而言之,结构可以是这样的:

with open('FullMergedData.csv', 'a') as output_csv: #this will append  
# the result to the file.
    with open(csv_filename, 'rb') as current_csv:
        for line in current_csv:   #loop through the lines
            if first_run != False:
                next(current_csv)
                first_run = True #After the first line,
                #you should immidiately change first_run to true.
            output_csv.writelines(line)  #write it per line
Run Code Online (Sandbox Code Playgroud)

应该解决你的问题.希望这可以帮助!