小编fur*_*559的帖子

减少内存使用熊猫

我无法想出一种方法来进一步减少该程序的内存使用量.这是我迄今为止最有效的实现:

columns = ['eventName', 'sessionId', "eventTime", "items", "currentPage", "browserType"]
df = pd.DataFrame(columns=columns)
l = []

for i, file in enumerate(glob.glob("*.log")):
    print("Going through log file #%s named %s..." % (i+1, file))
    with open(file) as myfile:
        l += [json.loads(line) for line in myfile]
        tempdata = pd.DataFrame(l)
        for column in tempdata.columns:
            if not column in columns:
                try:
                    tempdata.drop(column, axis=1, inplace=True)
                except ValueError:
                    print ("oh no! We've got a problem with %s column! It don't exist!" % (badcolumn))
        l = []
        df = df.append(tempdata, …

Run Code Online (Sandbox Code Playgroud)

python memory json pickle pandas

fur*_*559

2019 03-10

1
推荐指数

1
解决办法

3658
查看次数