我正在尝试使用一个小例子来学习 Dask。基本上我读入一个文件并计算行平均值。
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(cores=4, memory='24 GB')
cluster.scale(4)
from dask.distributed import Client
client = Client(cluster)
import dask
import numpy as np
import dask.dataframe as dd
mytbl = dd.read_csv('test.txt', sep=' ')
row_mean = mytbl.loc[:, mytbl.columns != 'chrom'].apply(np.mean, axis=1, meta=(None, 'float64'))
row_mean = row_mean.compute()
Run Code Online (Sandbox Code Playgroud)
当我运行时compute,我可以在Dask仪表板中看到内存使用量增加非常快,并且所有CPU也都被使用。但随后内存增加停止,我看到这个错误:
distributed.utils - ERROR - "('read-csv-apply-67c8f79a72a7499f86a1707b269e0776', 0)"
Traceback (most recent call last):
File "~/miniconda3/lib/python3.8/site-packages/distributed/utils.py", line 668, in log_errors
yield
File "~/miniconda3/lib/python3.8/site-packages/distributed/scheduler.py", line 3785, in add_worker
typename=types[key],
KeyError: "('read-csv-apply-67c8f79a72a7499f86a1707b269e0776', 0)"
distributed.core - ERROR …Run Code Online (Sandbox Code Playgroud)