如何编写一个 bash 脚本,该脚本遍历 parent_directory 中的每个目录并对特定文件执行命令。
目录结构如下:
Parent_dir/
dir1/
acc.bam
dir2/
acc.bam
dir3/
acc.bam
... around 30 directories
Run Code Online (Sandbox Code Playgroud)
这是我要使用的命令:
java8 -jar /picard.jar CollectRnaSeqMetrics REF_FLAT=/refFlathuman.refflat STRAND_SPECIFICITY=NONE I=acc.bam O=output
Run Code Online (Sandbox Code Playgroud) 我有一个这样的文件:
sample chr start end ref alt gene effect
AADA-01 chr1 12336579 12336579 C T VPS13D Silent
AADA-02 chr1 20009838 20009838 - CCA TMCO4 Missense
AADA-03 chr1 76397825 76397825 GTCA T ASB17 Missense
AADA-03 chr1 94548954 94548954 C A ABCA4 Missense
AADA-04 chr1 176762782 176762782 TCG C PAPPA2 Missense
AADA-04 chr1 183942764 183942764 - T COLGAL Missense
AADA-05 chr1 186076063 186076063 A TGC HMCN1 Silent
AADA-05 chr1 186076063 186076063 A T HM1 Silent
Run Code Online (Sandbox Code Playgroud)
我需要第 5 列和第 6 列仅包含一个字符的所有行。
结果应该如下所示:
sample …
Run Code Online (Sandbox Code Playgroud) 我有一个包含以下信息的文本文件:
Hugo_Symbol Tumor_Sample_Barcode Entrez_Gene_Id Center NCBI_Build
MTHFR TCGA-BD-A2L6-01A-11D-A20W-10 4524 BCM GRCh38
SLC30A1 TCGA-BD-A2L6-01A-11D-A20W-10 7779 BCM GRCh38
USH2A TCGA-BD-A2L6-01A-11D-A20W-10 7399 BCM GRCh38
SOS1 TCGA-BD-A2L6-01A-11D-A20W-10 6654 BCM GRCh38
TMEM51 TCGA-O8-A75V-01A-11D-A32G-10 55092 BCM GRCh38
FLG TCGA-O8-A75V-01A-11D-A32G-10 2312 BCM GRCh38
FLG TCGA-O8-A75V-01A-11D-A32G-10 2312 BCM GRCh38
PRDM16 TCGA-G3-A7M5-01A-11D-A33Q-10 63976 BCM GRCh38
DNAJC11 TCGA-G3-A7M5-01A-11D-A33Q-10 55735 BCM GRCh38
HNRNPCL2 TCGA-G3-A7M5-01A-11D-A33Q-10 440563 BCM GRCh38
C1orf94 TCGA-G3-A7M5-01A-11D-A33Q-10 84970 BCM GRCh38
NFYC TCGA-G3-A7M5-01A-11D-A33Q-10 4802 BCM GRCh38
IPP TCGA-G3-A7M5-01A-11D-A33Q-10 3652 BCM GRCh38
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,有多个示例,我想根据“Tumor_Sample_Barcode”列将文件拆分为多个文件。输出文件需要以samplename.txt 命名。
第一个输出 - TCGA-BD-A2L6-01A-11D-A20W-10.txt
Hugo_Symbol Tumor_Sample_Barcode Entrez_Gene_Id Center …
Run Code Online (Sandbox Code Playgroud)