使用 Dask 文档中的确切代码: https://jobqueue.dask.org/en/latest/examples.html
如果页面发生变化,代码如下:
from dask_jobqueue import SLURMCluster
from distributed import Client
from dask import delayed
cluster = SLURMCluster(memory='8g',
processes=1,
cores=2,
extra=['--resources ssdGB=200,GPU=2'])
cluster.scale(2)
client = Client(cluster)
def step_1_w_single_GPU(data):
return "Step 1 done for: %s" % data
def step_2_w_local_IO(data):
return "Step 2 done for: %s" % data
stage_1 = [delayed(step_1_w_single_GPU)(i) for i in range(10)]
stage_2 = [delayed(step_2_w_local_IO)(s2) for s2 in stage_1]
result_stage_2 = client.compute(stage_2,
resources={tuple(stage_1): {'GPU': 1},
tuple(stage_2): {'ssdGB': 100}})
Run Code Online (Sandbox Code Playgroud)
这会导致这样的错误:
distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback …Run Code Online (Sandbox Code Playgroud)