Snakemake using a rule in a loop

Fab*_*n_G 6 python shell snakemake

I'm trying to use Snakemake rules within a loop so that the rule takes the output of the previous iteration as input. Is that possible and if yes how can I do that?

Here is my example

  1. Setup the test data
mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt
Run Code Online (Sandbox Code Playgroud)
  1. Snakemake
SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)


#### LOOP ####
for i in list(range(1, 4)):
    # Setup prefix for input
    if i == 1:
        prefix = "test"
    else:
        prefix = "loop%s" % str(i-1)

    # Setup prefix for output
    opref =  "loop%s" % str(i)

    # Rule
    rule loop_rule:
        input:
            prefix+"/{sample}.txt"
        output:
            prefix+"/{sample}.txt"
            #expand("loop{i}/{sample}.txt", i = i, sample = wildcards.sample)
        params:
            add=prefix
        shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"
Run Code Online (Sandbox Code Playgroud)

Trying to run the example yields the ERROR CreateRuleException in line 26 of /Users/fabiangrammes/Desktop/Projects/snake_loop/Snakefile: The name loop_rule is already used by another rule. If anyone spots an option to get that thing to work it would be great!

Thanks !

mer*_*erv 7

我认为这是使用递归编程的好机会。而不是明确包括条件语句每次迭代,写一个规则的转变从一个迭代(n-1)n。所以,沿着这些路线的事情:

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        expand("loop3/{sample}.txt", sample=SAMPLES)

def recurse_sample(wcs):
    n = int(wcs.n)
    if n == 1:
        return "test/%s.txt" % wcs.sample
    elif n > 1:
        return "loop%d/%s.txt" % (n-1, wcs.sample)
    else:
        raise ValueError("loop numbers must be 1 or greater: received %s" % wcs.n)

rule loop_n:
    input: recurse_sample
    output: "loop{n}/{sample}.txt"
    wildcard_constraints:
        sample="[^/]+",
        n="[0-9]+"
    shell:
        """
        awk -v loop='loop{wildcards.n}' '{{print $0, loop}}' {input} > {output}
        """
Run Code Online (Sandbox Code Playgroud)

正如@RussHyde 所说,您需要积极主动地确保不会触发无限循环。为此,我们确保涵盖所有情况recurse_sample并使用wildcard_constraints以确保匹配准确。

  • 哦,我以前没有见过“wildcard_constraints”,我总是将它们编码在大括号内。这真的很有帮助。 (2认同)
  • 谢谢merv真优雅! (2认同)

Rus*_*yde 5

我的理解是,您的规则在运行之前会转换为 python 代码,并且在此过程中,Snakefile 中存在的所有原始 python 代码都是按顺序运行的。把它想象成你的snakemake 规则被评估为python 函数。

但是有一个限制,即任何规则只能对一个函数进行一次评估。

您可以使用 if/else 表达式并根据配置值等差异评估规则(一次),但您不能多次评估规则。

我不太确定如何重写你的 Snakefile 来实现你想要的。是否有一个真实的例子可以说明似乎需要循环结构的地方?

- - 编辑

对于固定次数的迭代,可以使用输入函数多次运行规则。(不过我会警告不要这样做,要非常小心地禁止无限循环)

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)

def looper_input(wildcards):
    # could be written more cleanly with a dictionary
    if (wildcards["prefix"] == "loop0"):
        input = "test/{}.txt".format(wildcards["sample"])
    else if (wildcards["prefix"] == "loop1"):
        input = "loop0/{}.txt".format(wildcards["sample"])
    ...
    return input


rule looper:
    input:
            looper_input
    output:
            "{prefix}/{sample}.txt"
    params:
            # ? should this be add="{prefix}" ?
            add=prefix
    shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"
Run Code Online (Sandbox Code Playgroud)