我正在尝试利用 Ray 的并行化模型来逐条处理文件记录。代码运行得很好,但是对象存储增长得很快,最终崩溃了。我避免使用 ray.get(function.remote()) 因为它会降低性能,因为该任务由几百万个子任务和等待任务完成的开销组成。有没有办法对对象存储设置全局限制?
#code which constantly backpressusre the obejct storage, freeing space, but causes performance to be worse than serial execution
for record in infile:
ray.get(createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename))
#code that maximizes throughput but makes the object storage grow constantly
for record in infile:
createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename)
#the called function returns either 0 or 1.
Run Code Online (Sandbox Code Playgroud)
你可以做ray.init(object_store_memory=10**9)
限制对象存储使用 1 GB 的系统 RAM(而不是默认情况下的全部)。
\n\nobject_store_memory \xe2\x80\x93 用于启动对象存储的内存量(以字节为单位)。默认情况下,这是根据可用系统内存自动设置的。
\n
(参见文档ray.init()
)
有关内存管理的文档中有更多信息,网址为https://docs.ray.io/en/releases-1.11.0/ray-core/memory-management.html。
\n 归档时间: |
|
查看次数: |
9829 次 |
最近记录: |