小编kad*_*ach的帖子

dask 的 set_index 的进度报告

我正在尝试在整个脚本周围添加一个进度指示器。但是,set_index(..., compute=False)仍然在调度程序上运行任务,可以在 Web 界面中观察到。

如何报告该set_index步骤的进度?

import dask.dataframe as dd
from dask.distributed import Client, progress

if __name__ == '__main__':

  with Client() as client:

    df = dd.read_csv('big.csv')

    # I can see on the web interface that something is happening.
    # This blocks 20-30s on this particular CSV.
    df = df.set_index('id', compute=False)

    # Progress reporting works from here
    out = client.compute(
      df
    )
    progress(out)

    # out.result()
    # ...
Run Code Online (Sandbox Code Playgroud)

dask dask-distributed

5
推荐指数
1
解决办法
213
查看次数

标签 统计

dask ×1

dask-distributed ×1