Lev*_*ike 27 performance pipe shell-script cut
I'm trying to find the most efficient way to iterate through certain values that are a consistent number of values away from each other in a space separated list of words(I don't want to use an array). For example,
list="1 ant bat 5 cat dingo 6 emu fish 9 gecko hare 15 i j"
Run Code Online (Sandbox Code Playgroud)
So I want to be able to just iterate through list and only access 1,5,6,9 and 15.
EDIT: I should have made it clear that the values I'm trying to get from the list don't have to be different in format from the rest of the list. What makes them special is solely their position in the list(In this case, position 1,4,7...). So the list could be1 2 3 5 9 8 6 90 84 9 3 2 15 75 55
but I'd still want the same numbers. And also, I want to be able to do it assuming I don't know the length of the list.
The methods I've thought of so far are:
Method 1
set $list
found=false
find=9
count=1
while [ $count -lt $# ]; do
if [ "${@:count:1}" -eq $find ]; then
found=true
break
fi
count=`expr $count + 3`
done
Run Code Online (Sandbox Code Playgroud)
Method 2
set list
found=false
find=9
while [ $# ne 0 ]; do
if [ $1 -eq $find ]; then
found=true
break
fi
shift 3
done
Run Code Online (Sandbox Code Playgroud)
Method 3 I'm pretty sure piping makes this the worst option, but I was trying to find a method that doesn't use set, out of curiosity.
found=false
find=9
count=1
num=`echo $list | cut -d ' ' -f$count`
while [ -n "$num" ]; do
if [ $num -eq $find ]; then
found=true
break
fi
count=`expr $count + 3`
num=`echo $list | cut -d ' ' -f$count`
done
Run Code Online (Sandbox Code Playgroud)
So what would be most efficient, or am I missing a simpler method?
ilk*_*chu 36
First rule of software optimization: Don't.
Until you know the speed of the program is an issue, there's no need to think
about how fast it is. If your list is about that length or just ~100-1000 items
long, you probably won't even notice how long it takes. There's a chance you're spending more time thinking about the optimization than what the difference would be.
Second rule: Measure.
That's the sure way to find out, and the one that gives answers for your system. Especially with shells, there are so many, and they aren't all identical. An answer for one shell might not apply for yours.
In larger programs, profiling goes here too. The slowest part might not be the one you think it is.
Third, the first rule of shell script optimization: Don't use the shell.
Yeah, really. Many shells aren't made to be fast (since launching external programs doesn't have to be), and they might even parse the lines of the source code again each time.
Use something like awk or Perl instead. In a trivial micro-benchmark I did, awk
was dozens of times faster than any common shell in running a simple loop (without I/O).
但是,如果您确实使用 shell,请使用 shell 的内置函数而不是外部命令。在这里,您使用的expr
which 不是我在系统上找到的任何 shell 中内置的,但可以用标准算术扩展替换。例如,i=$((i+1))
而不是i=$(expr $i + 1)
增加i
。您cut
在最后一个示例中使用的也可以替换为标准参数扩展。
步骤 #1 和 #2 应该适用于您的问题。
Dop*_*oti 18
Pretty simple with awk
. This will get you the value of every fourth field for input of any length:
$ awk -F' ' '{for( i=1;i<=NF;i+=3) { printf( "%s%s", $i, OFS ) }; printf( "\n" ) }' <<< $list
1 5 6 9 15
Run Code Online (Sandbox Code Playgroud)
This works be leveraging built-in awk
variables such as NF
(the number of fields in the record), and doing some simple for
looping to iterate along the fields to give you the ones you want without needing to know ahead of time how many there will be.
Or, if you do indeed just want those specific fields as specified in your example:
$ awk -F' ' '{ print $1, $4, $7, $10, $13 }' <<< $list
1 5 6 9 15
Run Code Online (Sandbox Code Playgroud)
As for the question about efficiency, the simplest route would be to test this or each of your other methods and use time
to show how long it takes; you could also use tools like strace
to see how the system calls flow. Usage of time
looks like:
$ time ./script.sh
real 0m0.025s
user 0m0.004s
sys 0m0.008s
Run Code Online (Sandbox Code Playgroud)
You can compare that output between varying methods to see which is the most efficient in terms of time; other tools can be used for other efficiency metrics.
Gil*_*il' 14
我只会在这个答案中给出一些一般性建议,而不是基准。基准测试是可靠回答有关性能问题的唯一方法。但是由于您没有说明您操作了多少数据以及执行此操作的频率,因此无法进行有用的基准测试。10 个项目的效率更高和 1000000 个项目的效率更高通常是不一样的。
作为一般经验法则,只要纯 shell 代码不涉及循环,调用外部命令比使用纯 shell 构造执行某些操作更昂贵。另一方面,迭代大字符串或大量字符串的 shell 循环可能比调用特殊用途工具慢。例如,您的循环调用cut
在实践中可能会明显变慢,但是如果您找到一种通过单个cut
调用完成整个事情的方法,这可能比在 shell 中使用字符串操作执行相同的事情要快。
请注意,系统之间的截止点可能会有很大差异。它可能取决于内核、内核调度程序的配置方式、包含外部可执行文件的文件系统、当前 CPU 与内存压力的大小以及许多其他因素。
expr
如果您完全关心性能,请不要打电话来执行算术。事实上,根本不要调用expr
执行算术。Shell 具有内置算法,比调用expr
.
您似乎在使用 bash,因为您使用的是 sh 中不存在的 bash 结构。那么到底为什么不使用数组呢?数组是最自然的解决方案,也可能是最快的。请注意,数组索引从 0 开始。
list=(1 2 3 5 9 8 6 90 84 9 3 2 15 75 55)
for ((count = 0; count += 3; count < ${#list[@]})); do
echo "${list[$count]}"
done
Run Code Online (Sandbox Code Playgroud)
如果您使用 sh,如果您的系统使用 dash 或 kshsh
而不是 bash ,您的脚本可能会更快。如果使用 sh,则不会获得命名数组,但仍会获得位置参数的数组之一,您可以使用set
. 要访问直到运行时才知道的位置的元素,您需要使用eval
(注意正确引用事物!)。
# List elements must not contain whitespace or ?*\[
list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'
set $list
count=1
while [ $count -le $# ]; do
eval "value=\${$count}"
echo "$value"
count=$((count+1))
done
Run Code Online (Sandbox Code Playgroud)
如果您只想访问数组一次并且从左到右(跳过某些值),您可以使用shift
变量索引代替。
# List elements must not contain whitespace or ?*\[
list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'
set $list
while [ $# -ge 1 ]; do
echo "$1"
shift && shift && shift
done
Run Code Online (Sandbox Code Playgroud)
哪种方法更快取决于外壳和元素数量。
另一种可能性是使用字符串处理。它的优点是不使用位置参数,因此您可以将它们用于其他用途。对于大量数据,它会更慢,但对于少量数据,这不太可能产生显着差异。
# List elements must be separated by a single space (not arbitrary whitespace)
list='1 2 3 5 9 8 6 90 84 9 3 2 15 75 55'
while [ -n "$list" ]; do
echo "${list% *}"
case "$list" in *\ *\ *\ *) :;; *) break;; esac
list="${list#* * * }"
done
Run Code Online (Sandbox Code Playgroud)