Dra*_*ana -2 scripting perl shell-script
我有4个文件。我需要检查所有文件的行数是否相同。
如果行数不同,我需要检测它并输出,例如:
#file1 - 10 lines, file2 - 9 lines, file3 - 10 lines, file4 - 10 lines
Line are miss matched
Number of lines 10 = 9 = 10 = 10
Run Code Online (Sandbox Code Playgroud)
如果它们相等,我想逐行合并文件,如下所示:
文件:
#file1
10
12
11
#file2
Arun
kamal
babu
#file3
300
200
400
#file4
spot1
spot4
spot5
Run Code Online (Sandbox Code Playgroud)
输出:
Set1
10
Arun
300
spot1
Set2
12
kamal
200
spot4
Set3
11
babu
400
spot5
Run Code Online (Sandbox Code Playgroud)
我的代码:
#
id_name=`cat file2`
echo $id_name
id_list=`cat file1`
echo $id_list
#
id_count=`cat file3`
echo $id_count
id_spot=`cat spot_list`
echo $id_spot
SS=`cat id_list | wc -l`
DS=`cat id_name | wc -l`
SF=`cat id_count | wc -l`
DF=`cat id_spot | wc -l`
if [ $SS == $DS == $SF == $DF ] then
echo " Line are matched"
echo " Total line $SS"
for i j in $id_list $id_name
do
for a b in $id_count $id_spot
do
k = 1
echo " Set$k"
$i
$j
$a
$b
done
done
else
echo " Line are Miss matched"
echo " Total line $SS = $DS = $SF = $DF"
fi
Run Code Online (Sandbox Code Playgroud)
With a really straightforward approach:
#!/usr/bin/env bash
SS=$(wc -l < file1)
DS=$(wc -l < file2)
SF=$(wc -l < file3)
DF=$(wc -l < file4)
if [[ $SS -eq $DS && $DS -eq $SF && $SF -eq $DF ]]; then
echo "Lines are matched"
echo "Total number of lines: $SS"
num=1
while (( num <= SS )); do
echo "Set$num"
tail -n +$num file1 | head -n 1
tail -n +$num file2 | head -n 1
tail -n +$num file3 | head -n 1
tail -n +$num file4 | head -n 1
((num++))
echo
done
else
echo "Line are miss matched"
echo "Number of lines $SS = $DS = $SF = $DF"
fi
Run Code Online (Sandbox Code Playgroud)
It is not very efficient as it calls tail 4*number_of_lines times but it is straightforward.
Another approach is to replace the while loop with awk:
awk '{
printf("\nSet%s\n", NR)
print;
if( getline < "file2" )
print
if( getline < "file3" )
print
if ( getline < "file4" )
print
}' file1
Run Code Online (Sandbox Code Playgroud)
To join files line by line, the paste command is very useful. You can use this instead of the while loop:
paste -d$'\n' file1 file2 file3 file4
Run Code Online (Sandbox Code Playgroud)
Or maybe a little less obvious:
{ cat -n file1 ; cat -n file2 ; cat -n file3; cat -n file4; } | sort -n | cut -f2-
Run Code Online (Sandbox Code Playgroud)
That will output the lines but with no formatting (no Set1, Set2, newlines, ...), so you have to format it afterwards with awk, for example:
awk '{
if ((NR-1)%4 == 0)
printf("\nSet%s\n", (NR+3)/4)
print
}' < <(paste -d$'\n' file1 file2 file3 file4)
Run Code Online (Sandbox Code Playgroud)
Some final notes:
echo "$var" | cmd or cat file | cmd when you can redirect input: cmd <<< "$var" or cmd < filefor loop. for i in ... is valid, whereas for i j in ... is not[[ ]] instead of [ ] for testing, see this answerResults of time, tested on files with 10000 lines:
#first approach
real 0m45.387s
user 0m5.904s
sys 0m3.836s
Run Code Online (Sandbox Code Playgroud)
#second approach - significantly faster
real 0m0.086s
user 0m0.024s
sys 0m0.040s
Run Code Online (Sandbox Code Playgroud)
#third approach - very close to second approach
real 0m0.074s
user 0m0.016s
sys 0m0.036s
Run Code Online (Sandbox Code Playgroud)
你能弄清楚如何检查的行数为每个文件(提示:wc)
要获得集合的输出:
paste File{1,2,3,4} | awk -F'\t' -v OFS='\n' '{$1=$1; print "Set"NR, $0, ""}'
Run Code Online (Sandbox Code Playgroud)
$1=$1 用于将输入字段分隔符转换为输出字段分隔符。
| 归档时间: |
|
| 查看次数: |
180 次 |
| 最近记录: |