Dra*_*ana -2 scripting perl shell-script
我有4个文件。我需要检查所有文件的行数是否相同。
如果行数不同,我需要检测它并输出,例如:
#file1 - 10 lines, file2 - 9 lines, file3 - 10 lines, file4 - 10 lines
Line are miss matched
Number of lines 10 = 9 = 10 = 10
Run Code Online (Sandbox Code Playgroud)
如果它们相等,我想逐行合并文件,如下所示:
文件:
#file1
10
12
11
#file2
Arun
kamal
babu
#file3
300
200
400
#file4
spot1
spot4
spot5
Run Code Online (Sandbox Code Playgroud)
输出:
Set1
10
Arun
300
spot1
Set2
12
kamal
200
spot4
Set3
11
babu
400
spot5
Run Code Online (Sandbox Code Playgroud)
我的代码:
#
id_name=`cat file2`
echo $id_name
id_list=`cat file1`
echo $id_list
#
id_count=`cat file3`
echo $id_count
id_spot=`cat spot_list`
echo $id_spot
SS=`cat id_list | wc -l`
DS=`cat id_name | wc -l`
SF=`cat id_count | wc -l`
DF=`cat id_spot | wc -l`
if [ $SS == $DS == $SF == $DF ] then
echo " Line are matched"
echo " Total line $SS"
for i j in $id_list $id_name
do
for a b in $id_count $id_spot
do
k = 1
echo " Set$k"
$i
$j
$a
$b
done
done
else
echo " Line are Miss matched"
echo " Total line $SS = $DS = $SF = $DF"
fi
Run Code Online (Sandbox Code Playgroud)
With a really straightforward approach:
#!/usr/bin/env bash
SS=$(wc -l < file1)
DS=$(wc -l < file2)
SF=$(wc -l < file3)
DF=$(wc -l < file4)
if [[ $SS -eq $DS && $DS -eq $SF && $SF -eq $DF ]]; then
echo "Lines are matched"
echo "Total number of lines: $SS"
num=1
while (( num <= SS )); do
echo "Set$num"
tail -n +$num file1 | head -n 1
tail -n +$num file2 | head -n 1
tail -n +$num file3 | head -n 1
tail -n +$num file4 | head -n 1
((num++))
echo
done
else
echo "Line are miss matched"
echo "Number of lines $SS = $DS = $SF = $DF"
fi
Run Code Online (Sandbox Code Playgroud)
It is not very efficient as it calls tail
4*number_of_lines times but it is straightforward.
Another approach is to replace the while
loop with awk
:
awk '{
printf("\nSet%s\n", NR)
print;
if( getline < "file2" )
print
if( getline < "file3" )
print
if ( getline < "file4" )
print
}' file1
Run Code Online (Sandbox Code Playgroud)
To join files line by line, the paste
command is very useful. You can use this instead of the while
loop:
paste -d$'\n' file1 file2 file3 file4
Run Code Online (Sandbox Code Playgroud)
Or maybe a little less obvious:
{ cat -n file1 ; cat -n file2 ; cat -n file3; cat -n file4; } | sort -n | cut -f2-
Run Code Online (Sandbox Code Playgroud)
That will output the lines but with no formatting (no Set1, Set2, newlines, ...), so you have to format it afterwards with awk
, for example:
awk '{
if ((NR-1)%4 == 0)
printf("\nSet%s\n", (NR+3)/4)
print
}' < <(paste -d$'\n' file1 file2 file3 file4)
Run Code Online (Sandbox Code Playgroud)
Some final notes:
echo "$var" | cmd
or cat file | cmd
when you can redirect input: cmd <<< "$var"
or cmd < file
for
loop. for i in ...
is valid, whereas for i j in ...
is not[[ ]]
instead of [ ]
for testing, see this answerResults of time
, tested on files with 10000 lines:
#first approach
real 0m45.387s
user 0m5.904s
sys 0m3.836s
Run Code Online (Sandbox Code Playgroud)
#second approach - significantly faster
real 0m0.086s
user 0m0.024s
sys 0m0.040s
Run Code Online (Sandbox Code Playgroud)
#third approach - very close to second approach
real 0m0.074s
user 0m0.016s
sys 0m0.036s
Run Code Online (Sandbox Code Playgroud)
你能弄清楚如何检查的行数为每个文件(提示:wc
)
要获得集合的输出:
paste File{1,2,3,4} | awk -F'\t' -v OFS='\n' '{$1=$1; print "Set"NR, $0, ""}'
Run Code Online (Sandbox Code Playgroud)
$1=$1
用于将输入字段分隔符转换为输出字段分隔符。