Tri*_*tra 5 python pipeline python-3.x snakemake
我对snakemake很陌生,而且对python也不太熟悉(所以抱歉,这可能是一个非常基本的愚蠢问题):
我目前正在构建一个管道来使用atlas分析一组 bamfiles 。这些 bam 文件位于不同的文件夹中,不应移动到公共文件夹中。因此,我决定提供一个如下所示的示例列表(这只是一个示例,实际上示例可能位于完全不同的驱动器上):
Sample Path
Sample1 /some/path/to/my/sample/
Sample2 /some/different/path/
Run Code Online (Sandbox Code Playgroud)
并将其加载到我的 config.yaml 中:
sample_file: /path/to/samplelist/samplslist.txt
Run Code Online (Sandbox Code Playgroud)
现在到我的 Snakefile:
import pandas as pd
#define configfile with paths etc.
configfile: "config.yaml"
#read-in dataframe and define Sample and Path
SAMPLES = pd.read_table(config["sample_file"])
BAMFILE = SAMPLES["Sample"]
PATH = SAMPLES["Path"]
rule all:
input:
expand("{path}{sample}.summary.txt", zip, path=PATH, sample=BAMFILE)
#this works like a charm as long as I give the zip-function in the rules 'all' and 'summary':
rule indexBam:
input:
"{path}{sample}.bam"
output:
"{path}{sample}.bam.bai"
shell:
"samtools index {input}"
#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
input:
bam="{path}{sample}.bam",
bai=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE)
params:
prefix="analysis/BAMDiagnostics/{sample}"
output:
"analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
"analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
"analysis/BAMDiagnostics/{sample}_MQ.txt",
"analysis/BAMDiagnostics/{sample}_readLength.txt",
"analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
message:
"running BamDiagnostics...{wildcards.sample}"
shell:
"{config[atlas]} task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"
rule summary:
input:
index=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE),
bamd=expand("analysis/BAMDiagnostics/{sample}_approximateDepth.txt", sample=BAMFILE)
output:
"{path}{sample}.summary.txt"
shell:
"echo -e '{input.index} {input.bamd}"
Run Code Online (Sandbox Code Playgroud)
我收到错误
path/to/my/Snakefile 第 28 行出现 WildcardError:无法从输出文件确定输入文件中的通配符:“path”
谁能帮我?
- 我尝试用join或创建输入函数来解决这个问题,但我认为我不够熟练,无法看到我的错误...
- 我想问题是,我的摘要规则不包含带有 的连{path}音bamdiagnostics-output (因为输出在其他地方)并且无法连接到输入文件等...
- 扩展我在 bamdiagnostics-rule 上的输入使代码可以工作,但是当然将每个样本输入带到每个样本输出并且造成了很大的混乱:
在这种情况下,两个 bam 文件都用于创建每个输出文件。这是错误的,因为样本和输出是独立处理的。
根据图集文档,您似乎需要为每个示例单独运行每个规则,这里的复杂之处在于每个示例都位于单独的路径中。
我修改了您的脚本以适用于上述情况(请参阅DAG)。脚本开头的变量经过修改以使其更有意义。config出于演示目的而被删除,并pathlib使用了库(而不是os.path.join)。pathlib没有必要,但它可以帮助我保持理智。修改了 shell 命令以避免config.
import pandas as pd
from pathlib import Path
df = pd.read_csv('sample.tsv', sep='\t', index_col='Sample')
SAMPLES = df.index
BAM_PATH = df["Path"]
# print (BAM_PATH['sample1'])
rule all:
input:
expand("{path}{sample}.summary.txt", zip, path=BAM_PATH, sample=SAMPLES)
rule indexBam:
input:
str( Path("{path}") / "{sample}.bam")
output:
str( Path("{path}") / "{sample}.bam.bai")
shell:
"samtools index {input}"
#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
input:
bam = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam"),
bai = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam.bai"),
params:
prefix="analysis/BAMDiagnostics/{sample}"
output:
"analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
"analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
"analysis/BAMDiagnostics/{sample}_MQ.txt",
"analysis/BAMDiagnostics/{sample}_readLength.txt",
"analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
message:
"running BamDiagnostics...{wildcards.sample}"
shell:
".atlas task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"
rule summary:
input:
bamd = "analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
index = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam.bai"),
output:
str( Path("{path}") / "{sample}.summary.txt")
shell:
"echo -e '{input.index} {input.bamd}"
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
9684 次 |
| 最近记录: |