所以dask现在已经更新,以支持groupby的自定义聚合功能.(感谢开发团队和@chmp的工作!).我目前正在尝试构建一个模式函数和相应的计数函数.基本上我所设想的是,模式为每个分组返回特定列的最常见值的列表(即[4,1,2]).此外,还有一个相应的计数函数,它返回这些值的实例数,即.3.
现在我正在尝试在代码中实现它.根据groupby.py文件,自定义聚合的参数如下:
Parameters
----------
name : str
the name of the aggregation. It should be unique, since intermediate
result will be identified by this name.
chunk : callable
a function that will be called with the grouped column of each
partition. It can either return a single series or a tuple of series.
The index has to be equal to the groups.
agg : callable
a function that will be called to aggregate the results of each chunk.
Again the …Run Code Online (Sandbox Code Playgroud)