通配符 Snakemake 规则的预处理

abu*_*kaj 5 python snakemake

我有一个 Snakemake 配方,其中包含一个非常昂贵的准备步骤,对于所有调用来说都很常见。这是用于演示的伪规则:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    run:
        import somemodule
        
        data = somemodule.Loader("some_big_data")  # expensive
        np.savez(output, data.process(input))  # also expensive
Run Code Online (Sandbox Code Playgroud)

目前,每个目标都从头data加载,这不是最理想的。我怎样才能让它只加载一次?

我寻找一些允许重写规则的东西:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    setup:
        import somemodule
        
        data = somemodule.Loader("some_big_data")  # expensive
    run:
        np.savez(output, data.process(input))  # also expensive
Run Code Online (Sandbox Code Playgroud)

或者:

rule sample:
    input:
        "{name}.config"
    output:
        "{name}.npz"
    run:
        import somemodule

        data = somemodule.Loader("some_big_data")  # expensive
        
        for job in jobs:
            np.savez(job.output,
                     data.process(job.input))  # also expensive
Run Code Online (Sandbox Code Playgroud)

在另一个问题中我已经描述了代码Loader.__init__()是基于.

Sul*_*yev 1

一种可能的解决方案是使用感兴趣的数据创建一个腌制对象。请研究使用 pickled 对象的安全注意事项,以检查它是否适合您的情况。如果是的话,那么它会沿着以下路线:

rule sample:
    input:
        "{name}.config"
    output:
        pickle = "{name}.pickle",
    run:
        import somemodule
        import pickle
        
        data = somemodule.Loader("some_big_data")  # expensive
        pickle.dump(pickle, output.pickle)
Run Code Online (Sandbox Code Playgroud)

在下游规则中,您将像任何其他文件一样引用 pickled 文件,只需确保使用pickle.load.