Fab*_*n_G 6 python shell snakemake
I'm trying to use Snakemake rules within a loop so that the rule takes the output of the previous iteration as input. Is that possible and if yes how can I do that?
Here is my example
mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt
Run Code Online (Sandbox Code Playgroud)
SAMPLES = ["SampleA", "SampleB"]
rule all:
input:
# Output of the final loop
expand("loop3/{sample}.txt", sample = SAMPLES)
#### LOOP ####
for i in list(range(1, 4)):
# Setup prefix for input
if i == 1:
prefix = "test"
else:
prefix = "loop%s" % str(i-1)
# Setup prefix for output
opref = "loop%s" % str(i)
# Rule
rule loop_rule:
input:
prefix+"/{sample}.txt"
output:
prefix+"/{sample}.txt"
#expand("loop{i}/{sample}.txt", i = i, sample = wildcards.sample)
params:
add=prefix
shell:
"awk '{{print $0, {params.add}}}' {input} > {output}"
Run Code Online (Sandbox Code Playgroud)
Trying to run the example yields the ERROR CreateRuleException in line 26 of /Users/fabiangrammes/Desktop/Projects/snake_loop/Snakefile:
The name loop_rule is already used by another rule. If anyone spots an option to get that thing to work it would be great!
Thanks !
我认为这是使用递归编程的好机会。而不是明确包括条件语句每次迭代,写一个规则的转变从一个迭代(n-1)到n。所以,沿着这些路线的事情:
SAMPLES = ["SampleA", "SampleB"]
rule all:
input:
expand("loop3/{sample}.txt", sample=SAMPLES)
def recurse_sample(wcs):
n = int(wcs.n)
if n == 1:
return "test/%s.txt" % wcs.sample
elif n > 1:
return "loop%d/%s.txt" % (n-1, wcs.sample)
else:
raise ValueError("loop numbers must be 1 or greater: received %s" % wcs.n)
rule loop_n:
input: recurse_sample
output: "loop{n}/{sample}.txt"
wildcard_constraints:
sample="[^/]+",
n="[0-9]+"
shell:
"""
awk -v loop='loop{wildcards.n}' '{{print $0, loop}}' {input} > {output}
"""
Run Code Online (Sandbox Code Playgroud)
正如@RussHyde 所说,您需要积极主动地确保不会触发无限循环。为此,我们确保涵盖所有情况recurse_sample并使用wildcard_constraints以确保匹配准确。
我的理解是,您的规则在运行之前会转换为 python 代码,并且在此过程中,Snakefile 中存在的所有原始 python 代码都是按顺序运行的。把它想象成你的snakemake 规则被评估为python 函数。
但是有一个限制,即任何规则只能对一个函数进行一次评估。
您可以使用 if/else 表达式并根据配置值等差异评估规则(一次),但您不能多次评估规则。
我不太确定如何重写你的 Snakefile 来实现你想要的。是否有一个真实的例子可以说明似乎需要循环结构的地方?
- - 编辑
对于固定次数的迭代,可以使用输入函数多次运行规则。(不过我会警告不要这样做,要非常小心地禁止无限循环)
SAMPLES = ["SampleA", "SampleB"]
rule all:
input:
# Output of the final loop
expand("loop3/{sample}.txt", sample = SAMPLES)
def looper_input(wildcards):
# could be written more cleanly with a dictionary
if (wildcards["prefix"] == "loop0"):
input = "test/{}.txt".format(wildcards["sample"])
else if (wildcards["prefix"] == "loop1"):
input = "loop0/{}.txt".format(wildcards["sample"])
...
return input
rule looper:
input:
looper_input
output:
"{prefix}/{sample}.txt"
params:
# ? should this be add="{prefix}" ?
add=prefix
shell:
"awk '{{print $0, {params.add}}}' {input} > {output}"
Run Code Online (Sandbox Code Playgroud)