Meh*_*hdi 1 python dask dask-distributed
我有两个在计算中相互依赖的数据帧,我想通过一次compute()调用获得两个数据帧的结果。代码可以总结如下:
import dask
import dask.dataframe
import dask.distributed
import pandas as pd
df = dask.dataframe.from_pandas(
pd.DataFrame({
"group": ['a', 'b', 'a', 'b', 'a', 'b', 'b'],
"var_1": [0, 1, 2, 1, 2, 1, 0],
"var_2": [1, 1, 2, 1, 2, 1, 0]}), npartitions=2)
with dask.distributed.Client() as client:
for i in range(10):
df_agg = foo(df)
df = bar(df, df_agg)
print(df.compute())
print(df_agg.compute()) # -> I would like to have only one .compute() call and get the results of both dataframes (df and df_agg)
Run Code Online (Sandbox Code Playgroud)
非常感谢您的帮助
小智 5
只需使用dask.compute...
https://docs.dask.org/en/stable/ generated/dask.dataframe.compute.html
print(dask.compute(df, df_agg))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
34 次 |
| 最近记录: |