一个简单的代码片段如下:注释后跟###很重要..
from dask.distributed import Client
### this code-piece will get executed on a dask worker.
def task_to_perform():
print("task in progress.")
## do something here..
print("task is over.!")
### whereas the below code will run on client side,
### assume on a different node than worker
client= Client("127.0.01:8786")
future = client.submit(task_to_perform)
print("task results::", future.result())
Run Code Online (Sandbox Code Playgroud)
因此执行的控制流程将如下所示:dask-client将任务提交给dask-scheduler,调度程序将根据可用的工作程序调用必须提交给任务的工作者.
但是我们在dask中是否有任何机制可以通过它来绕过dask-scheduler在专用/特定工作者上提交我的任务?
我试图将比 VRAM 多的数据传递到我的 GPU 中,这导致了以下错误。 CudaAPIError: Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
我创建了这个代码来重现这个问题:
from numba import cuda
import numpy as np
@cuda.jit()
def addingNumbers (big_array, big_array2, save_array):
i = cuda.grid(1)
if i < big_array.shape[0]:
for j in range (big_array.shape[1]):
save_array[i][j] = big_array[i][j] * big_array2[i][j]
big_array = np.random.random_sample((1000000, 500))
big_array2 = np.random.random_sample((1000000, 500))
save_array = np.zeros(shape=(1000000, 500))
arraysize = 1000000
threadsperblock = 64
blockspergrid = (arraysize + (threadsperblock - 1))
d_big_array = cuda.to_device(big_array)
d_big_array2 = cuda.to_device(big_array2)
d_save_array = cuda.to_device(save_array)
addingNumbers[blockspergrid, threadsperblock](d_big_array, …Run Code Online (Sandbox Code Playgroud) 是否可以dask从 python 脚本运行?
在交互式会话中,我可以只写
from dask.distributed import Client
client = Client()
Run Code Online (Sandbox Code Playgroud)
如所有教程中所述。但是,如果我将这些行写入script.py文件并执行它python script.py,它会立即崩溃。
我找到了另一个选项,就是使用 MPI:
# script.py
from dask_mpi import initialize
initialize()
from dask.distributed import Client
client = Client() # Connect this local process to remote workers
Run Code Online (Sandbox Code Playgroud)
然后使用mpirun -n 4 python script.py. 这不会崩溃,但是如果您打印客户端
print(client)
# <Client: scheduler='tcp://137.250.37.84:35145' processes=0 cores=0>
Run Code Online (Sandbox Code Playgroud)
你看到没有使用内核,因此脚本永远运行而不做任何事情。
如何正确设置我的脚本?
from dask.distributed import Client
Client()
Client(do_not_spawn_new_if_default_address_in_use=True) # should not spawn a new default cluster
Run Code Online (Sandbox Code Playgroud)
这有可能以某种方式做吗?
我有两个在计算中相互依赖的数据帧,我想通过一次compute()调用获得两个数据帧的结果。代码可以总结如下:
import dask
import dask.dataframe
import dask.distributed
import pandas as pd
df = dask.dataframe.from_pandas(
pd.DataFrame({
"group": ['a', 'b', 'a', 'b', 'a', 'b', 'b'],
"var_1": [0, 1, 2, 1, 2, 1, 0],
"var_2": [1, 1, 2, 1, 2, 1, 0]}), npartitions=2)
with dask.distributed.Client() as client:
for i in range(10):
df_agg = foo(df)
df = bar(df, df_agg)
print(df.compute())
print(df_agg.compute()) # -> I would like to have only one .compute() call and get the results of both dataframes (df and df_agg) …Run Code Online (Sandbox Code Playgroud)