从大型数据集增量构建箱线图

Mag*_*gea 5 python ram plot matplotlib

假设我的计算机上保存了 4 个文件作为 .npz 文件:W、X、Y 和 Z。假设我的计算机在 RAM 消耗方面无法承受同时加载多个文件。

我怎样才能运行这个命令?:

 matplotlib.pyplot.boxplot([W],[X],[Y],[Z])
Run Code Online (Sandbox Code Playgroud)

换句话说,我如何加载W,绘制W,删除W,然后加载Y,绘制Y,删除Y,...并将它们中的4个放在同一个图上?(而不是次要情节)

谢谢 !

hit*_*tzg 7

The matplotlib.axes.boxplot function actually calls two functions under the hood. One to compute the necessary statistics (cbook.boxplot_stats) and one to actually draw the plot (matplotlib.axes.bxp). You can exploit this structure, by calling the first for each dataset (by loading one at a time) and then feed the results to the plotting function.

In this example below we have 3 datasets and iterate over them to collect the output of cbook.boxplot_stats (which needs only very little memory). After that call to ax.bxp creates the graph. (In your application you would iteratively load a file, use boxplot_stats and delete the data)

import matplotlib.cbook as cbook
import matplotlib.pyplot as plt
import numpy as np


x = np.random.rand(10,10)
y = np.random.rand(10,10)
z = np.random.rand(10,10)

fig, ax = plt.subplots(1,1)

bxpstats = list()
for dataset, label in zip([x, y, z], ['X', 'Y', 'Z']):
    bxpstats.extend(cbook.boxplot_stats(np.ravel(dataset), labels=[label]))
ax.bxp(bxpstats)
plt.show()
Run Code Online (Sandbox Code Playgroud)

Result:

在此输入图像描述