从shell脚本中删除临时文件,从文件中提取模式

pmr*_*pmr 2 shell

我有一个输入文本文件:

EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL      15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L       15JAN2016 
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.H       15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.S       15JAN2016 
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD            15JAN2016 
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD            15JAN2016
Run Code Online (Sandbox Code Playgroud)

如果样本数据达到dot(.)的最大级别,我们需要唯一类型的1个代表性样本(完整行),没有日期.所以输出将是

EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD
Run Code Online (Sandbox Code Playgroud)

(输出中行的顺序无关紧要.)

下面的程序工作正常但它会生成许多中间临时文件.如果没有壳中我们怎么办?

#input file name and path assumed in current directory
file="./osc.txt"
resultfilepath="./OSCoutput.txt"
tmpfilepath="./OSCtempoutput.txt"
tmp1filepath="./OSCtemp1output.txt"
tmp2filepath="./OSCtemp2output.txt"


rm $resultfilepath
rm $tmpfilepath
#using awk to filter only series data without dates
awk -F' ' '{print $1}' $file >> $tmpfilepath

#getting all the unique dataclass_names at column 1
DATACLASSNAME=(`cut -f 1 -d'.' $tmpfilepath | sort | uniq`)
for i in "${DATACLASSNAME[@]}"; do
rm $tmp1filepath
#we are filtering the file with that dataclass
awk -F'.' -v awk_dataclassname="$i" '$1==awk_dataclassname' $tmpfilepath >> $tmp1filepath
#also we are calculating the number of delimeter in filtered record and sorting it
COLCOUNT=(`awk -F'.' '{print NF}' $tmp1filepath | uniq | sort`)
for j in "${COLCOUNT[@]}"; do
rm $tmp2filepath
#in the filtered data we are taking series of a particular dimension length and dumping data
awk -F '.' -v awk_colcount="$j" '(NF==awk_colcount){print}' $tmp1filepath >> $tmp2filepath
#reducing column no by 1
newj=$(echo $((j - 1)))
#removing last column(generally observation dimension) by cut column
GREPSAMPLE=(`cut -f -$newj -d'.' $tmp2filepath | uniq`)
SAMPLELENGTH=(`wc -l $tmp2filepath`)
#we are now taking unique series sample
for k in "${GREPSAMPLE[@]}"; do
#doing grep of unique sample but taking the whole line
echo `grep $k $tmp1filepath | head -1` >> $resultfilepath

done
done
done
cat $resultfilepath
echo "processing finish"
Run Code Online (Sandbox Code Playgroud)

200*_*ess 7

只需要这个awk调用就可以完成整个过程.

awk '{
    key = $0;
    sub("\\.[^.]*$", "", key);      # Let key be everything up to the last dot

    if (!seen[key]) { print $1 }    # If key has not been seen, print 1st col
    seen[key] = 1;                  # Mark the key as seen
}' "$file" > "$resultfilepath"
Run Code Online (Sandbox Code Playgroud)

通常,当你有一个涉及大量awking和grepping的脚本时,你很可能只需编写一个awk脚本.