如何在Snakemake中进行部分扩展?

bli*_*bli 3 variable-expansion python-3.x snakemake

我正在尝试首先为LETTERS x NUMS组合生成4个文件,然后通过NUMS进行汇总,以获得LETTERS中每个元素的一个文件:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{letter}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """
Run Code Online (Sandbox Code Playgroud)

执行此snakefile会导致以下错误:

WildcardError in line 19 of /tmp/Snakefile:
No values given for wildcard 'letter'.
  File "/tmp/Snakefile", line 19, in <module>
Run Code Online (Sandbox Code Playgroud)

似乎部分expand是不可能的.这是一个限制expand吗?如果是这样,我该如何规避呢?

Sch*_*lar 8

可以使用 进行部分扩展allow_missing=True

例如:

expand("text_{letter}_{num}.txt", num=[1, 2], allow_missing=True)

Run Code Online (Sandbox Code Playgroud)
> ["text_{letter}_1.txt", "text_{letter}_2.txt"]
Run Code Online (Sandbox Code Playgroud)


bli*_*bli 6

看来这不是限制expand,而是我对python中字符串格式化方式的熟悉程度的限制.我需要为非扩展通配符使用双括号:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{{letter}}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """
Run Code Online (Sandbox Code Playgroud)

执行此snakefile现在会生成预期的以下文件:

text_A_2.txt
text_A_1.txt
text_B_2.txt
text_B_1.txt
combined_A.txt
combined_B.txt
Run Code Online (Sandbox Code Playgroud)