标签: snakemake

是否可以让 yapf 忽略文件的某些部分？

我正在使用一个名为 Snakemake 的 python-dsl，如下所示：

from bx.intervals.cluster import ClusterTree

from epipp.config import system_prefix, include_prefix, config, expression_matrix
config["name"] = "correlate_chip_regions_and_rna_seq"

bin_sizes = {"H3K4me3": 1000, "PolII": 200, "H3K27me3": 200}

rule all:
    input:
        expand("data/{bin_size}_{modification}.bed", zip,
               bin_size=bin_sizes.values(), modification=bin_sizes.keys())

rule get_gene_expression:
    input:
        expression_matrix
    output:
        "data/expression/series.csv"
    run:
        expression_matrix = pd.read_table(input[0])
        expression_series = expression_matrix.sum(1).sort_values(ascending=False)
        expression_series.to_csv(output[0], sep=" ")

Run Code Online (Sandbox Code Playgroud)

我想对run:块内的东西运行 yapf 。

是否有可能让 yapf 忽略 python 中不存在的内容，例如rule关键字等，而只在文件的特定部分使用它？

python snakemake yapf

The*_*Cat

2016 10-26

6
推荐指数

1
解决办法

4329
查看次数

Snakemake中当前规则的名称

我正在与Snakemake合作，但找不到找到当前规则名称的方法。

例如，有没有办法像这样进行访问：

rule job1:
    input: check_inputs(rules.current.name)
    output: ...

Run Code Online (Sandbox Code Playgroud)

当check_inputs每个规则的功能大致相同时，这将非常有用。

当然，我做到了，它的工作原理是：

rule job1:
    input: check_inputs("job1")
    output: ...

Run Code Online (Sandbox Code Playgroud)

但是，我想知道是否存在一种更多的“ Snakemaker方式”来获取当前规则的名称，以避免每次都编写/硬编码该规则的名称。

任何帮助或建议将不胜感激。

--- EDIT1 ---只有当snakemake解析了and 语句时，才能
访问规则名称。因此在/ 定义中无法使用。{rules.myrule.name}inputoutput{rules.myrule.name}inputoutput

这样做的想法是例如快速访问当前规则的名称{rules.current}，因为{rules.myrule.name}它也是重复的。

python python-3.x snakemake

gli*_*ihm

2018 07-11

6
推荐指数

1
解决办法

533
查看次数

如何获取snakemake输出规则中通配符值的基本名称？

在以下示例中，将在与输入文件相同的位置创建输出文件。有没有办法在输出部分获取通配符值的基本名称，以便我可以使用输入文件的基本名称来命名输出文件，但将其写入不同的位置？

infile=['/home/user/folder1/file1','/home/user/folder2/file2/']

rule one:
 input: expand("{myfile}", myfile = infile)

 output: "{myfile}" + ".out"

 shell: "touch {wildcards.myfile}.out"

Run Code Online (Sandbox Code Playgroud)

wildcard snakemake

Vee*_*era

lucky-day

6
推荐指数

1
解决办法

1676
查看次数

什么是snakemake元数据文件？我什么时候可以删除？

我注意到我的备份rsync脚本花了很多时间从.snakemake/metadata文件夹中复制具有随机名称的内容。

这些文件是用来做什么的？

我可以在完成蛇形运行之后安全地擦除它们吗，或者它们是蛇形正确执行下一次运行所必需的吗？

更一般而言，是否有一些有关蛇形.snakemake文件夹中创建的文件的文档？

snakemake

bli*_*bli

lucky-day

6
推荐指数

1
解决办法

411
查看次数

Snakemake params函数是否在输入文件存在之前进行评估？

考虑一下这个蛇文件：

def rdf(fn):
    f = open(fn, "rt")
    t = f.readlines()
    f.close()
    return t

rule a:
    output: "test.txt"
    input: "test.dat"
    params: X=lambda wildcards, input, output, threads, resources: rdf(input[0])
    message: "X is {params.X}"
    shell: "cp {input} {output}"

rule b:
    output: "test.dat"
    shell: "echo 'hello world' >{output}"

Run Code Online (Sandbox Code Playgroud)

当运行并且test.txt和test.dat都不存在时，会出现此错误：

InputFunctionException in line 7 of /Users/tedtoal/Documents/BioinformaticsConsulting/Mars/Cacao/Pipeline/SnakeMake/t2:
FileNotFoundError: [Errno 2] No such file or directory: 'test.dat'

Run Code Online (Sandbox Code Playgroud)

但是，如果test.dat存在，则可以正常运行。为什么？

我希望在snakemake准备运行规则'a'之前不对参数进行评估。相反，它必须在DAG阶段中在运行规则“ a”之前调用上述params函数rdf（）。但是，即使最初不存在test.dat，也可以进行以下操作：

import os

def rdf(fn):
    if not os.path.exists(fn): return ""
    f = open(fn, "rt")
    t = f.readlines()
    f.close() …

Run Code Online (Sandbox Code Playgroud)

parameters snakemake

ted*_*oal

lucky-day

6
推荐指数

1
解决办法

291
查看次数

Snake制作一张通配符并展开

是否可以使用带有通配符的snakemake并展开：

rule a:
    input:
        "input/{first}.txt",
        expand("data/{second}.txt", second=A_LIST)
    output:
        expand("output/{first}_{second}, second=A_LIST)

Run Code Online (Sandbox Code Playgroud)

snakemake

par*_*par

2017 12-19

6
推荐指数

1
解决办法

1990
查看次数

Snakemake：如何根据输入文件大小动态设置内存资源

我试图将给定规则的集群内存分配基于输入文件的文件大小。这在snakemake中可能吗？如果可以的话如何实现？

到目前为止，我已经尝试在该resource:部分中指定它，如下所示：

rule compute2:
    input: "input1.txt"
    output: "input2.txt"
    resources:
        mem_mb=lambda wildcards, input, attempt: int(os.path.getsize(str(input))/(1024*1024))
    shell: "touch input2.txt"

Run Code Online (Sandbox Code Playgroud)

但似乎 Snakemake 尝试在创建文件之前预先计算此值，因为我收到此错误：

InputFunctionException in line 35 of test_snakemake/Snakefile:
FileNotFoundError: [Errno 2] No such file or directory: 'input1.txt'

Run Code Online (Sandbox Code Playgroud)

我使用以下命令运行我的snakemake：

snakemake --verbose -j 10 --cluster-config cluster.json --cluster "sbatch -n {cluster.n} -t {cluster.time} --mem {resources.mem_mb}"

Run Code Online (Sandbox Code Playgroud)

snakemake

KBo*_*hme

lucky-day

6
推荐指数

1
解决办法

1334
查看次数

Snakemake + docker示例，如何使用卷

让我们有一个简单的蛇文件

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    shell:
        "somecommand {input} {output}"

Run Code Online (Sandbox Code Playgroud)

我想归纳出绘图规则，以便它可以在docker容器中运行，

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    singularity:
        "docker://joseespinosa/docker-r-ggplot2"
    shell:
        "somecommand {input} {output}"

Run Code Online (Sandbox Code Playgroud)

如果我了解得很好，当我运行时，我会在docker容器中snakemake --use-singularity获得该somecommand运行，如果不对容器进行一些卷配置，则无法找到输入的csv文件。

您能否提供一个小的工作示例，说明如何在Snakefile或其他Snakemake文件中配置卷？

docker snakemake singularity-container

mox*_*mox

lucky-day

6
推荐指数

1
解决办法

663
查看次数

Snakemake using a rule in a loop

I'm trying to use Snakemake rules within a loop so that the rule takes the output of the previous iteration as input. Is that possible and if yes how can I do that?

Here is my example

Setup the test data

mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt

Run Code Online (Sandbox Code Playgroud)

Snakemake

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)


#### LOOP ####
for i in list(range(1, 4)):
    # Setup …

Run Code Online (Sandbox Code Playgroud)

python shell snakemake

Fab*_*n_G

2019 05-23

6
推荐指数

2
解决办法

567
查看次数

通过 Snakemake 的符号链接（自动生成）目录

我正在尝试为 Snakemake 工作流程中的输出目录别名创建一个符号链接目录结构。

让我们考虑以下示例：

很久以前，在一个遥远的星系里，有人想找到宇宙中最好的冰淇淋口味，并进行了一项调查。我们的示例工作流程旨在通过目录结构表示投票。调查是用英语进行的（因为他们在那个外国星系都说英语），但结果也应该被非英语人士理解。符号链接可以解决问题。

为了使我们人类和 Snakemake 可以解析输入，我们将它们粘贴到一个 YAML 文件中：

cat config.yaml

Run Code Online (Sandbox Code Playgroud)

flavours:
  chocolate:
    - vader
    - luke
    - han
  vanilla:
    - yoda
    - leia
  berry:
    - windu
translations:
  french:
    chocolat: chocolate
    vanille: vanilla
    baie: berry
  german:
    schokolade: chocolate
    vanille: vanilla
    beere: berry

Run Code Online (Sandbox Code Playgroud)

为了创建相应的目录树，我从这个简单的 Snakefile 开始：

flavours:
  chocolate:
    - vader
    - luke
    - han
  vanilla:
    - yoda
    - leia
  berry:
    - windu
translations:
  french:
    chocolat: chocolate
    vanille: vanilla
    baie: berry
  german:
    schokolade: chocolate
    vanille: vanilla
    beere: berry

Run Code Online (Sandbox Code Playgroud)

我确信有更多 'pythonic' 方法来实现我想要的，但这只是一个简单的例子来说明我的问题。 …

python directory symlink makefile snakemake

msc*_*lli

2020 07-21

6
推荐指数

1
解决办法

395
查看次数