我有这个文件:
m64071_220512_054244/12584899/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAACGTGCGACCTTGTGA
m64071_220512_054244/128321000/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAACGTGCGACCTTGTGA
m64071_220512_054244/132186699/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/134874748/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
Run Code Online (Sandbox Code Playgroud)
仅当以下情况时,我才需要tr字段reverse$4 和 $5$2==rev
预计:
m64071_220512_054244/12584899/ccs rev pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/128321000/ccs rev pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/132186699/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
m64071_220512_054244/134874748/ccs fwd pet047-10055 TCACAAGGTCGCACGT TCACAAGGTCGCACGTTTGAACCCTCAA
Run Code Online (Sandbox Code Playgroud)
我试过:
perl -lpe 'if(/rev/) {$rev=/rev/;next}; if ($rev) {$F[4,5]=~tr/ATGC/TACG/; $F[4,5]=reverse $F[4,5]; print "@F"}' file
Run Code Online (Sandbox Code Playgroud)
我还尝试使用 Awk (在 awk 中执行 bash 命令并打印命令输出)
awk '{
if($2==rev)
{
cmd1="echo \047" $4 "\047 …Run Code Online (Sandbox Code Playgroud) 我有这个 test.txt 文件:
gene 1:362273700-362275735
exon 1:362275166-362275246
exon 1:362274811-362275058
exon 1:362274230-362274685
gene 1:362279796-362287281
exon 1:362279796-362280179
exon 1:362280576-362280662
exon 1:362280858-362280958
exon 1:362281056-362281106
Run Code Online (Sandbox Code Playgroud)
我需要得到这个输出:
gene-1 1:362275166-362275246
gene-1 1:362274811-362275058
gene-1 1:362274230-362274685
gene-2 1:362279796-362280179
gene-2 1:362280576-362280662
gene-2 1:362280858-362280958
gene-2 1:362281056-362281106
Run Code Online (Sandbox Code Playgroud)
->实际上,我需要删除“基因”行,并将每个“外显子”行替换为“gene-X”(其中X以1开头)。
我对此很挣扎。
awk '$1~/exon/ {print $0 (/^exon/ ? "-" (++c) : "")}' test.txt
exon 1:362275166-362275246-1
exon 1:362274811-362275058-2
exon 1:362274230-362274685-3
exon 1:362279796-362280179-4
exon 1:362280576-362280662-5
exon 1:362280858-362280958-6
exon 1:362281056-362281106-7
awk '$1~/exon/ {$1=$1 "-" (++count[$1])}1' test.txt
gene 1:362273700-362275735
exon-1 1:362275166-362275246
exon-2 1:362274811-362275058
exon-3 1:362274230-362274685
gene 1:362279796-362287281
exon-4 …Run Code Online (Sandbox Code Playgroud) 我想将参数传递到 sbatch 命令行。
#!/bin/bash
#SBATCH -o job-%A_task.out
#SBATCH --job-name=paral_cor
#SBATCH --partition=normal
#SBATCH --time=1-00:00:00
#SBATCH --mem=200G
#SBATCH --cpus-per-task=16
#SBATCH --array=1-10
#Set up whatever package we need to run with
module load gcc/8.1.0 openblas/0.3.3 R
# SET UP DIRECTORIES
OUTPUT="$HOME"/PROJET_M2/data/$(date +"%Y%m%d")_parallel_nodes_test
mkdir -p "$OUTPUT"
export FILENAME="$HOME"/vipailler/PROJET_M2/bin/RHO_COR.R
subset=$((SLURM_ARRAY_TASK_ID))
file="$HOME"/PROJET_M2/raw/truelength2.prok2.uniref2.rares.tsv
#Run the program
echo "Start job :"`date` >> "$OUTPUT"/"$SLURM_ARRAY_TASK_ID".txt
echo "Start job :"`date`
echo PWD $PWD
Rscript $FILENAME --file $file --subset $subset > "$OUTPUT"/"$SLURM_ARRAY_TASK_ID"
wait
echo "Stop job : "`date` >> "$OUTPUT"/"$SLURM_ARRAY_TASK_ID".txt
echo …Run Code Online (Sandbox Code Playgroud)