Snakemake temp() 导致不必要的规则重新运行

Question

Snakemake temp() 导致不必要的规则重新运行

我正在使用 Snakemake v 5.4.0，并且遇到了 temp() 的问题。在假设的场景中：

Rule A --> Rule B1 --> Rule C1
     |
      --> Rule B2 --> Rule C2 

where Rule A generates temp() files used by both pathways 1 (B1 + C1) and 2 (B2 + C2).

Run Code Online (Sandbox Code Playgroud)

如果我运行管道，由 RuleA 生成的 temp() 文件将在两个路径中使用后被删除，这正是我所期望的。但是，如果我随后想要重新运行 Pathway 2，则必须重新创建 RuleA 的 temp() 文件，这会触发整个管道的重新运行，而不仅仅是 Pathway2。对于长管道来说，这在计算上变得非常昂贵。除了不使用之外，还有什么好方法可以防止这种情况发生吗temp()？在我的情况下，这需要很多 TB 的额外硬盘空间？

Answer 1

dar*_*ber 0

您可以动态地创建规则的输入文件列表all，或者调用第一个规则，具体取决于路径 2 的输出是否已存在（并满足一些健全性检查）。

output= ['P1.out']
if not os.path.exists('P2.out'): # Some more conditions here...
    output.append('P2.out')

rule all:
    input:
        output

rule make_tmp:
    output:
        temp('a.out')
    shell:
        r"""
        touch {output}
        """

rule make_P1:
    input:
        'a.out'
    output:
        'P1.out'
    shell:
        r"""
        touch {output}
        """

rule make_P2:
    input:
        'a.out'
    output:
        'P2.out'
    shell:
        r"""
        touch {output}
        """

Run Code Online (Sandbox Code Playgroud)

然而，这在某种程度上违背了使用snakemake的意义。如果必须重新创建途径 1 的输入，如何确保其输出仍然是最新的？

归档时间：	6 年，8 月前
查看次数：	822 次
最近记录：	1 年，11 月前