pch*_*pch 7 python bokeh jupyter jupyter-notebook jupyter-lab
我使用背景虚化来绘制许多时间序列(> 100)与内多点(〜20,000)Jupyter实验室笔记本。当执行小区多次在Jupyter的存储器的消耗铬超过400MB每次运行的增加。在多次单元执行后,Chrome 往往会崩溃,通常是在累积了几 GB 的 RAM 使用量时。此外,每次执行后绘图往往会变慢。
Jupyter 中的“清除 [所有] 输出”或“重新启动内核并清除所有输出...”也不会释放任何内存。在经典的Jupyter Notebook以及Firefox或Edge 中,也会出现此问题。
我的 .ipynp 的最小版本:
import numpy as np
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
import bokeh
output_notebook() # See e.g.: https://github.com/bokeh/bokeh-notebooks/blob/master/tutorial/01%20-%20Basic%20Plotting.ipynb
Run Code Online (Sandbox Code Playgroud)
# Just create a list of numpy arrays with random-walks as dataset
ts_length = 20000
n_lines = 100
np.random.seed(0)
dataset = [np.cumsum(np.random.randn(ts_length)) + i*100 for i in range(n_lines)]
Run Code Online (Sandbox Code Playgroud)
# Plot exactly the same linechart every time
plot = figure(x_axis_type="linear")
for data in dataset:
plot.line(x=range(ts_length), y=data)
show(plot)
Run Code Online (Sandbox Code Playgroud)
即使我每次在重新执行上面的(绘图)单元格之前执行以下单元格,这种“内存泄漏”行为也会继续:
bokeh.io.curdoc().clear()
bokeh.io.state.State().reset()
bokeh.io.reset_output()
output_notebook() # has to be done again because output was reset
Run Code Online (Sandbox Code Playgroud)
我是否必须在 Jupyter Notebook 中以其他方式绘制(或显示情节)以避免此问题?或者这只是 Bokeh/Jupyter 的错误?
我的系统上安装的版本(Windows 10):
- Python 3.6.6:Anaconda 自定义(64 位)
- 散景:1.4.0
- 铬:78.0.3904.108
- 朱庇特:
- 核心:4.6.1
- 实验室:1.1.4
- ipywidgets:7.5.1
- 标签扩展:
- @bokeh/jupyter_bokeh:v1.1.1
- @jupyter-widgets/jupyterlab-manager: v1.0.*
太长了;这可能值得为 提出一个问题。
只是关于不同方面的一些注释:
首先要注意的是:
bokeh.io.curdoc().clear()
bokeh.io.state.State().reset()
bokeh.io.reset_output()
Run Code Online (Sandbox Code Playgroud)
只影响Python进程中的数据结构(例如Jupyter Kernel)。它们永远不会对浏览器内存使用或占用空间产生任何影响。
仅根据数据,我预计约为 64MB 左右:
20000 * 100 * 2 * 2 * 8 = 64MB
Run Code Online (Sandbox Code Playgroud)
也就是说:100 行,有 20k (x,y) 点,也将转换为 (sx,sy) 屏幕坐标,全部位于 float64 (8byte) 类型数组中。然而,Bokeh 还为所有数据构建了空间索引,以支持悬停工具等功能。我预计你会用这些数据来放大这个索引。使此功能可配置可能是值得的,这样不需要命中测试的人就不必为此付费。讨论这个问题的功能请求问题是合适的。
应该有 DOM 事件触发器,当笔记本单元重新执行时,它们会被清理。也许这些已经坏了?不幸的是,用一个小团队维护三个大型混合 Python/JS 工具(包括经典 Notebook)之间的集成是一个持续的挑战。错误报告问题将是适当的,以便可以跟踪和调查此问题。
现在你能做什么?
At least for the specific case you have here with timeseries all of the same length, that above code is structured in a very suboptimal way. You should try putting everything in a single ColumnDataSource instead:
ts_length = 20000
n_lines = 100
np.random.seed(0)
source = ColumnDataSource(data=dict(x=np.arange(ts_length)))
for i in range(n_lines):
source.data[f"y{i}"] = np.cumsum(np.random.randn(ts_length)) + i*100
plot = figure()
for i in range(n_lines):
plot.line(x='x', y=f"y{i}", source=source)
show(plot)
Run Code Online (Sandbox Code Playgroud)
By passing sequence literals to line, your code results in the creation 99 unnecessary CDS objects (one per line call). Also does not re-used the x data, resulting in sending 99*20k extra points to BokehJS unnecessarily. And by sending a plain list instead of a numpy array, these also all get encoded using the less efficient (in time and space) default JSON encoding, instead of the efficient binary encoding that is available for numpy arrays.
That said, this is not causing all the issues here, and is probably not a solution on its own. But I wanted to make sure to point it out.
For this many points, you might consider using DataShader in conjunction with Bokeh. The Holoviews library also integrates Bokeh and Datashader automatically at a high level. By pre-rendering images on the Python side, Datashader is effectively a bandwidth compression tool (among other things).
Bokeh tilts trade-off towards affording various kinds of interactivity. But if you don't actually need that interactivity, then you are paying some extra costs. If that's your situation, you could consider generating static PNGs instead:
from bokeh.io.export import get_screenshot_as_png
p = get_screenshot_as_png(plot)
Run Code Online (Sandbox Code Playgroud)
You'll need to install the additional optional dependencies listed in Exporting Plots and if you are doing many plots you might want to consider saving and reusing a webdriver explicitly for each call.
| 归档时间: |
|
| 查看次数: |
1003 次 |
| 最近记录: |