小编Edg*_*r H的帖子

如何在 Python 中使用 prototbuf 映射？

给定一个原型定义

message EndpointResult {
    int32 endpoint_id = 1;
    // property id as key
    map<int32, TimeSeries> properties = 2;
}

message TimeSeries {
    repeated TimeEntry value = 2;
}

message TimeEntry {
    int32 time_unit = 1;
    float value = 2;
}

Run Code Online (Sandbox Code Playgroud)

我希望在 EndpointResult 类中填充地图。我尝试了文档中建议的不同方法，但都给我带来了错误。

设置测试类

end_point_rslt = nom.EndpointResult()
end_point_rslt.endpoint_id=0

ts = nom.TimeSeries()
te = ts.value.add()
te.time_unit = 0
te.value = 5.

Run Code Online (Sandbox Code Playgroud)

然后尝试不同的方法：

end_point_rslt.properties[0] = ts

Run Code Online (Sandbox Code Playgroud)

ValueError：不允许直接分配子消息

end_point_rslt.properties[0].submessage_field = ts

Run Code Online (Sandbox Code Playgroud)

AttributeError：不允许分配（协议消息对象中没有字段“submessage_field”）。

end_point_rslt.properties.get_or_create(0)
end_point_rslt.properties[0] = ts

Run Code Online (Sandbox Code Playgroud)

ValueError：不允许直接分配子消息

end_point_rslt.properties.get_or_create(0)
end_point_rslt.properties[0].submessage_field = ts …

Run Code Online (Sandbox Code Playgroud)

python protocol-buffers grpc

Edg*_*r H

lucky-day

12
推荐指数

1
解决办法

5638
查看次数

无需使用 dask dataframe 即可获取 dask 数组的唯一行

有没有办法获取大于可用内存的 dask 数组的唯一行？理想情况下，不将其转换为 dask DataFrame？

我目前使用这种方法

import dask.array as da
import dask.dataframe as dd

dx = da.random.random((10000, 10000), chunks=(1000, 1000))
ddf = dd.from_dask_array(dx)
ddf = ddf.drop_duplicates()
dx = ddf.to_dask_array(lengths=True)

Run Code Online (Sandbox Code Playgroud)

它适用于更大的数据集，np.unique(dx, axis=0)但最终也会耗尽内存。

我使用的是 Python 3.6（但可以升级）、Dask 0.20 和 Ubuntu 18.04 LTS。

python numpy dask

Edg*_*r H

2018 11-20

5
推荐指数

1
解决办法

971
查看次数