在Bash中将字符串拆分为数组

Lgn*_*Lgn 577 arrays bash split

在Bash脚本中,我想将一行分成几部分并将它们存储在一个数组中.

这条线:

Paris, France, Europe
Run Code Online (Sandbox Code Playgroud)

我想将它们放在这样的数组中:

array[0] = Paris
array[1] = France
array[2] = Europe
Run Code Online (Sandbox Code Playgroud)

我想用简单的代码,命令的速度无关紧要.我该怎么做?

Pau*_*ce. 1002

IFS=', ' read -r -a array <<< "$string"
Run Code Online (Sandbox Code Playgroud)

另外,在字符$IFS被单独视为分离器,使得在这种情况下,字段可以由被分离或者逗号或空间而不是两个字符的序列.有趣的是,当输入中出现逗号空格时,不会创建空字段,因为空格是专门处理的.

要访问单个元素:

echo "${array[0]}"
Run Code Online (Sandbox Code Playgroud)

迭代元素:

for element in "${array[@]}"
do
    echo "$element"
done
Run Code Online (Sandbox Code Playgroud)

要同时获取索引和值:

for index in "${!array[@]}"
do
    echo "$index ${array[index]}"
done
Run Code Online (Sandbox Code Playgroud)

最后一个例子很有用,因为Bash数组很稀疏.换句话说,您可以删除元素或添加元素,然后索引不连续.

unset "array[1]"
array[42]=Earth
Run Code Online (Sandbox Code Playgroud)

要获取数组中的元素数:

echo "${#array[@]}"
Run Code Online (Sandbox Code Playgroud)

如上所述,数组可以是稀疏的,因此您不应使用长度来获取最后一个元素.以下是Bash 4.2及更高版本中的方法:

echo "${array[-1]}"
Run Code Online (Sandbox Code Playgroud)

在任何版本的Bash中(从2.05b之后的某个地方):

echo "${array[@]: -1:1}"
Run Code Online (Sandbox Code Playgroud)

较大的负偏移选择远离数组末尾.请注意旧表单中减号之前的空格.这是必需的.

  • 只需使用`IFS =','`,您就不必单独删除空格.测试:`IFS =','读-a array <<<"巴黎,法国,欧洲"; echo"$ {array [@]}"` (14认同)
  • @ l0b0:谢谢.我不知道我在想什么.顺便说一句,我喜欢使用`declare -p array`来测试输出. (4认同)
  • `str ="巴黎,法国,欧洲,洛杉矶"; IFS =','读-r -a数组<<<"$ str"`将拆分为`array =([0] ="Paris"[1] ="France"[2] ="Europe"[3] ="洛杉矶"[4] ="洛杉矶")`作为一张纸条.所以这只适用于没有空格的字段,因为`IFS =','`是一组单独的字符 - 不是字符串分隔符. (3认同)
  • @YisraelDov:Bash无法单独处理CSV.它无法区分引号内的逗号和它们之外的逗号.您需要使用一种理解CSV的工具,例如更高级语言的lib,例如Python中的[csv](https://docs.python.org/2/library/csv.html)模块. (2认同)

bgo*_*dst 273

这个问题的所有答案在某种程度上都是错误的.


错误答案#1

IFS=', ' read -r -a array <<< "$string"
Run Code Online (Sandbox Code Playgroud)

1:这是误用$IFS.所述的值$IFS变量作为一个单可变长度字符串分隔符,相反,它是作为一个单字符串分离器,其中,每个该字段read从输入线分裂出可通过终止任何字符集合中的(逗号空格,在此示例中).

实际上,对于那些真正的坚持者而言,其全部含义$IFS更为复杂.从bash手册:

shell将IFS的每个字符视为分隔符,并使用这些字符作为字段终止符将其他扩展的结果拆分为单词.如果未设置IFS,或者其值正好是<space> <tab> <newline>,则为默认值,然后是先前扩展结果的开头和结尾的<space>,<tab><newline>的序列被忽略,并且任何不在开头或结尾的IFS字符序列都用于分隔单词.如果IFS的值不是默认值,则空白字符<space>,<tab><的序列只要空格字符在IFS的值(IFS空白字符)中,就会在单词的开头和结尾处被忽略.IFS中任何不是IFS空格的字符,以及任何相邻的IFS空白字符,都会分隔字段.一系列IFS空白字符也被视为分隔符.如果IFS的值为null,则不会发生分词.

基本上,对于非默认的非空值$IFS,字段可以用(1)一个或多个字符的序列分隔,这些字符全部来自"IFS空白字符"(即,<space>中的任何一个,<tab><newline>("换行符"表示换行符(LF))存在于其中的任何位置$IFS,或者(2)任何非"IFS空白字符",$IFS其中包含任何"IFS空白字符"在输入行.

对于OP,前一段中描述的第二种分离模式可能正是他想要的输入字符串,但我们可以非常确信我描述的第一种分离模式根本不正确.例如,如果他的输入字符串是'Los Angeles, United States, North America'什么?

IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")
Run Code Online (Sandbox Code Playgroud)

2:即使你使用具有一个单字符分离该溶液(如通过本身逗号,即,没有下述空间或其它行李),如果该值$string可变恰好包含任何LF类,然后read将一旦遇到第一个LF就停止处理.该read内建只处理每次调用一行.即使您将输入管道或重定向到read语句也是如此,正如我们在此示例中使用here-string机制所做的那样,因此保证丢失未处理的输入.为read内置程序提供动力的代码不知道其包含的命令结构中的数据流.

你可以说这不太可能导致问题,但是,如果可能的话,应该避免这是一个微妙的危险.它是由read内置实际上执行两级输入拆分的事实引起的:首先是行,然后是字段.由于OP只需要一级分割,因此read内置的这种用法是不合适的,我们应该避免使用它.

3:此解决方案的一个非显而易见的潜在问题是,read如果尾随字段为空,则始终会丢弃该字段,否则会保留空字段.这是一个演示:

string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")
Run Code Online (Sandbox Code Playgroud)

也许OP不会关心这一点,但它仍然是一个值得了解的限制.它降低了解决方案的稳健性和通用性.

这个问题可以通过在输入字符串之前向输入字符串附加一个虚拟尾部分隔符来解决read,我将在后面进行演示.


错误答案#2

string="1:2:3:4:5"
set -f                     # avoid globbing (expansion of *).
array=(${string//:/ })
Run Code Online (Sandbox Code Playgroud)

类似的想法:

t="one,two,three"
a=($(echo $t | tr ',' "\n"))
Run Code Online (Sandbox Code Playgroud)

(注意:我在命令替换周围添加了缺少的括号,回答者似乎已经省略了.)

类似的想法:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
Run Code Online (Sandbox Code Playgroud)

这些解决方案利用数组赋值中的字拆分将字符串拆分为字段.有趣的是,就像read一般的单词拆分也使用$IFS特殊变量,虽然在这种情况下暗示它被设置为其默认值<space> <tab> <newline>,因此任何一个或多个IFS的序列字符(现在都是空格字符)被认为是字段分隔符.

这解决了由两个分裂级别提交的问题read,因为单词分裂本身仅构成一个分裂级别.但与以前一样,这里的问题是输入字符串中的各个字段已经包含$IFS字符,因此在字拆分操作期间它们将被不正确地分割.对于这些回答者提供的任何示例输入字符串都不是这种情况(如何方便......),但当然这并没有改变使用这个习语的任何代码库然后冒险的事实.如果这个假设在某些时候被违反了,那么就会被炸毁.再一次,考虑我的'Los Angeles, United States, North America'(或'Los Angeles:United States:North America')反例.

此外,词的拆分通常接着文件名扩展(又名路径扩展又名通配符),其中,如果进行,将包含字符可能会损坏的话*,?[随后](如果extglob被设置,括号片段之前通过?,*,+,@,或者!)通过将它们与文件系统对象匹配并相应地扩展单词("globs").这三个回答者中的第一个通过set -f预先运行来禁用globbing来巧妙地削弱了这个问题.从技术上讲这是有效的(尽管你应该添加set +f 之后为可能依赖它的后续代码重新启用globbing,但是为了破解本地代码中的基本字符串到数组解析操作,不得不混淆全局shell设置.

这个答案的另一个问题是所有空字段都将丢失.根据应用,这可能是也可能不是问题.

Note: If you're going to use this solution, it's better to use the ${string//:/ } "pattern substitution" form of parameter expansion, rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable (tr or sed), since parameter expansion is purely a shell-internal operation. (Also, for the tr and sed solutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in the echo command and potentially mess with the field values. Also, the $(...) form of command substitution is preferable to the old `...` form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)


Wrong answer #3

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','
Run Code Online (Sandbox Code Playgroud)

This answer is almost the same as #2. The difference is that the answerer has made the assumption that the fields are delimited by two characters, one of which being represented in the default $IFS, and the other not. He has solved this rather specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting to split the fields on the surviving IFS-represented delimiter character.

This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider my counterexample: 'Los Angeles, United States, North America'.

Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing for the assignment with set -f and then set +f.

Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.


Wrong answer #4

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"
Run Code Online (Sandbox Code Playgroud)

This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.

One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved by wrapping the critical statement in set -f and set +f.

Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields will be lost, just as in #2 and #3. This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.

So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't care about empty fields, and you wrap the critical statement in set -f and set +f, then this solution works, but otherwise not.

(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...' syntax, e.g. IFS=$'\n';.)


Wrong answer #5

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"
Run Code Online (Sandbox Code Playgroud)

Similar idea:

IFS=', ' eval 'array=($string)'
Run Code Online (Sandbox Code Playgroud)

This solution is effectively a cross between #1 (in that it sets $IFS to comma-space) and #2-4 (in that it uses word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the above wrong answers, sort of like the worst of all worlds.

Also, regarding the second variant, it may seem like the eval call is completely unnecessary, since its argument is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using eval in this way. Normally, when you run a simple command which consists of a variable assignment only, meaning without an actual command word following it, the assignment takes effect in the shell environment:

IFS=', '; ## changes $IFS in the shell environment
Run Code Online (Sandbox Code Playgroud)

This is true even if the simple command involves multiple variable assignments; again, as long as there's no command word, all variable assignments affect the shell environment:

IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment
Run Code Online (Sandbox Code Playgroud)

But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin or external:

IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it
Run Code Online (Sandbox Code Playgroud)

Relevant quote from the bash manual:

If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.

It is possible to exploit this feature of variable assignment to change $IFS only temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the $OIFS variable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the $IFS assignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the : builtin to make the $IFS assignment temporary? This does not work because it would then make the $array assignment temporary as well:

IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command
Run Code Online (Sandbox Code Playgroud)

So, we're effectively at an impasse, a bit of a catch-22. But, when eval runs its code, it runs it in the shell environment, as if it was normal, static source code, and therefore we can run the $array assignment inside the eval argument to have it take effect in the shell environment, while the $IFS prefix assignment that is prefixed to the eval command will not outlive the eval command. This is exactly the trick that is being used in the second variant of this solution:

IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does
Run Code Online (Sandbox Code Playgroud)

So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of eval; just be careful to single-quote the argument string to guard against security threats.

But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.


Wrong answer #6

IFS=', '; array=(Paris, France, Europe)

IFS=' ';declare -a array=(Paris France Europe)
Run Code Online (Sandbox Code Playgroud)

Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents of the input string pasted into an array literal. I guess that's one way to do it.

It looks like the answerer may have assumed that the $IFS variable affects all bash parsing in all contexts, which is not true. From the bash manual:

IFS    The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is <space><tab><newline>.

So the $IFS special variable is actually only used in two contexts: (1) word splitting that is performed after expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by the read builtin.

Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution. Bash must first parse the source code, which obviously is a parsing event, and then later it executes the code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue with the description of the $IFS variable that I just quoted above; rather than saying that word splitting is performed after expansion, I would say that word splitting is performed during expansion, or, perhaps even more precisely, word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net version of the bash manual:

Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

You could argue the GNU version of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion section:

Expansion is performed on the command line after it has been split into tokens.

The important point is, $IFS does not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the various compatxx shell settings, which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.


Wrong answer #7

string='first line
        second line
        third line'

while read -r line; do lines+=("$line"); done <<<"$string"
Run Code Online (Sandbox Code Playgroud)

This is one of the best solutions. Notice that we're back to using read. Didn't I say earlier that read is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call read in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.

But there are problems. First: When you provide at least one NAME argument to read, it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS is set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage of read is to pass zero NAME arguments. In this case, read will store the entire input line that it gets from the input stream in a variable named $REPLY, and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust usage of read which I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:

string=$'  a  b  \n  c  d  \n  e  f  '; ## input string

a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a  b" [1]="c  d" [2]="e  f") ## read trimmed surrounding whitespace

a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="  a  b  " [1]="  c  d  " [2]="  e  f  ") ## no trimming
Run Code Online (Sandbox Code Playgroud)

The second issue with this solution is that it does not actually address the case of a custom field separator, such as the OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution. We could try to at least split on comma by specifying the separator to the -d option, but look what happens:

string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")
Run Code Online (Sandbox Code Playgroud)

Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that read returns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.

Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the -d option, and the __CODE_

  • 看起来`readarray`不是OSX上可用的内置版本. (13认同)
  • 这正是说服您永远不要在 bash 中编写代码的事情。这是一个极其简单的任务,但有 8 个错误的解决方案。顺便说一句,这没有“让它尽可能晦涩和挑剔”的设计约束 (13认同)
  • 注意(虽然可以理解,你没有空间),"readarray"的`-d`选项首先出现在Bash 4.4中也可能有所帮助. (5认同)
  • 哇,多么精彩的答案!嘻嘻,我的回答是:放弃 bash 脚本并启动 python! (4认同)
  • 我会把你的正确答案移到顶部,我不得不滚动浏览很多垃圾才能找到正确的做法:-) (4认同)
  • 我听说编辑是写作的主要部分。虽然我很欣赏这种彻底性,但如果您可以将其减少到 &lt; 5K 个字符并专注于答案,那么这将是一个更好的答案。 (4认同)
  • 好答案(+1)。如果您将awk更改为`awk'{gsub(/,[] + | $ /,“ \ 0”); print}'`并消除最终`“,”`的串联,那么您就不必经过体操来消除最终记录了。因此:在支持`readarray`的Bash上,`readarray -td'a &lt;&lt;(awk'{gsub(/,[] + /,“ \ 0”); print;}'&lt;&lt;&lt;“ $ string”)` 。注意你的方法是Bash 4.4+我认为是因为`readarray`中的`-d` (2认同)
  • @datUser那是不幸的.你的bash版本对于`readarray`来说必须太旧了.在这种情况下,您可以使用基于`read`构建的第二好的解决方案.我指的是:`a =(); 读取时,做一个+ =("$ REPLY"); 完成<<<"$ string,";`(如果需要多字符分隔符支持,则使用`awk`替换).如果你遇到任何问题,请告诉我.我很确定这个解决方案应该适用于相当旧版本的bash,回到版本2,就像二十年前发布的那样. (2认同)
  • OSX 上的 @datUser bash 仍然停留在 3.2(大约 2007 年发布);我使用 Homebrew 中的 bash 在 OS X 上获得 4.X bash 版本 (2认同)
  • readarray: -d: 无效选项,你确定你的答案是正确的吗? (2认同)
  • 现在*这*是一个答案!它值得 500 点赏金的每一点。 (2认同)
  • @dawg 没有 unset (`readarray -td '' a &lt; &lt;(awk '{ gsub(/,[ ]+/,"\0"); print; }' &lt;&lt;&lt;"$string")`) 的解决方案会导致我的系统上最后一个有尾随换行符的数组元素(Bash 5.0) (2认同)
  • 我将使用这个: `readarray -td, a &lt; &lt;(printf "%s" "$string"); 声明 -pa` 没有尾随的 '\n' hth (2认同)

小智 212

这是一种不设置IFS的方法:

string="1:2:3:4:5"
set -f                      # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
    echo "$i=>${array[i]}"
done
Run Code Online (Sandbox Code Playgroud)

想法是使用字符串替换:

${string//substring/replacement}
Run Code Online (Sandbox Code Playgroud)

用空格替换$ substring的所有匹配项,然后使用替换字符串初始化数组:

(element1 element2 ... elementN)
Run Code Online (Sandbox Code Playgroud)

注意:这个答案使用了split + glob运算符.因此,为了防止某些字符(例如*)的扩展,暂停这个脚本的globbing是个好主意.

  • 警告:这种方法遇到了问题.如果你有一个名为*的元素,你也将获得你的cwd的所有元素.因此string ="1:2:3:4:*"将根据您的实现给出一些意想不到的和可能危险的结果.没有得到相同的错误(IFS =','读-a数组<<<"$ string"),这个似乎是安全的使用. (12认同)
  • 对于多种价值观不可靠,请小心使用 (5认同)
  • 引用`$ {string //:/}`可以防止shell扩展 (4认同)
  • @Jim 如果“string”中的元素包含空格怎么办? (2认同)

Jmo*_*y38 86

t="one,two,three"
a=($(echo "$t" | tr ',' '\n'))
echo "${a[2]}"
Run Code Online (Sandbox Code Playgroud)

打印三

  • 我其实更喜欢这种方法.简单. (6认同)
  • 我复制并粘贴了它,它确实不能用echo,但是当我在for循环中使用它时它确实有用. (4认同)
  • 这不符合规定.@ Jmoney38或者如果您可以将它粘贴在终端中并获得所需的输出,请将结果粘贴到此处. (2认同)
  • @abalter使用`a =($(echo $ t | tr','“ \ n”))`为我工作。与`a =($(echo $ t | tr',''')))的结果相同。 (2认同)

Luc*_*one 29

有时我碰巧接受答案中描述的方法不起作用,特别是如果分隔符是回车符.
在那些情况下,我用这种方式解决了:

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

for line in "${lines[@]}"
    do
        echo "--> $line"
done
Run Code Online (Sandbox Code Playgroud)

  • [以下是在分隔符为换行符时使接受的答案有效的答案](/sf/ask/1989219011/ - 使用最读取内置). (4认同)
  • +1 这对我完全有用。我需要将多个字符串(由换行符分隔)放入一个数组中,并且 `read -a arr &lt;&lt;&lt; "$strings"` 不适用于 `IFS=$'\n'`。 (3认同)

小智 27

接受的答案适用于一行中的值.
如果变量有几行:

string='first line
        second line
        third line'
Run Code Online (Sandbox Code Playgroud)

我们需要一个非常不同的命令来获取所有行:

while read -r line; do lines+=("$line"); done <<<"$string"

或者更简单的bash readarray:

readarray -t lines <<<"$string"
Run Code Online (Sandbox Code Playgroud)

使用printf功能打印所有线条非常容易:

printf ">[%s]\n" "${lines[@]}"

>[first line]
>[        second line]
>[        third line]
Run Code Online (Sandbox Code Playgroud)

  • 虽然并非每种解决方案都适用于每种情况,但您对readarray的提及...用5分钟取代了我的最后两个小时... (2认同)

jhn*_*hnc 14

我对 @bgoldst 的流行答案中“正确答案”的相对性能感到好奇,它明显谴责循环,所以我针对三个纯 bash 实现对其进行了一个简单的基准测试。

总而言之,我建议:

  1. 对于字符串长度 < 4k 左右,纯 bash 比 gawk 更快
  2. 对于分隔符长度 < 10 且字符串长度 < 256k,纯 bash 与 gawk 相当
  3. 对于分隔符长度 >> 10 且字符串长度 < 64k 左右,纯 bash 是“可接受的”;gawk 的速度快了不到 5 倍
  4. 对于字符串长度 < 512k 左右,gawk 是“可接受的”

我随意将“可接受”定义为“分割字符串需要 < 0.5 秒”。


我认为问题是使用任意长度的分隔符字符串(而不是正则表达式)获取 bash 字符串并将其拆分为 bash 数组。

# in: $1=delim, $2=string
# out: sets array a
Run Code Online (Sandbox Code Playgroud)

我的纯 bash 实现是:

# naive approach - slow
split_byStr_bash_naive(){
    a=()
    local prev=""
    local cdr="$2"
    [[ -z "${cdr}" ]] && a+=("")
    while [[ "$cdr" != "$prev" ]]; do
        prev="$cdr"
        a+=( "${cdr%%"$1"*}" )
        cdr="${cdr#*"$1"}"
    done
    # echo $( declare -p a | md5sum; declare -p a )
}
Run Code Online (Sandbox Code Playgroud)
# use lengths wherever possible - faster
split_byStr_bash_faster(){
    a=()
    local car=""
    local cdr="$2"
    while
        car="${cdr%%"$1"*}"
        a+=("$car")
        cdr="${cdr:${#car}}"
        (( ${#cdr} ))
    do
        cdr="${cdr:${#1}}"
    done
    # echo $( declare -p a | md5sum; declare -p a )
}
Run Code Online (Sandbox Code Playgroud)
# use pattern substitution and readarray - fastest
split_byStr_bash_sub(){
        a=()
        local delim="$1" string="$2"

        delim="${delim//=/=-}"
        delim="${delim//$'\n'/=n}"

        string="${string//=/=-}"
        string="${string//$'\n'/=n}"

        readarray -td $'\n' a <<<"${string//"$delim"/$'\n'}"

        local len=${#a[@]} i s
        for (( i=0; i<len; i++ )); do
                s="${a[i]//=n/$'\n'}"
                a[i]="${s//=-/=}"
        done
        # echo $( declare -p a | md5sum; declare -p a )
}
Run Code Online (Sandbox Code Playgroud)

天真的版本中的初始-z测试处理传递零长度字符串的情况。不进行测试,输出数组为空;有了它,数组就有了一个零长度元素。

替换readarraywhile read会导致 < 10% 的减速。


这是我使用的 gawk 实现:

split_byRE_gawk(){
    readarray -td '' a < <(awk '{gsub(/'"$1"'/,"\0")}1' <<<"$2$1")
    unset 'a[-1]'
    # echo $( declare -p a | md5sum; declare -p a )
}
Run Code Online (Sandbox Code Playgroud)

显然,在一般情况下,需要清理 delim 参数,因为 gawk 需要正则表达式,而 gawk 特殊字符可能会导致问题。此外,按原样,该实现将无法正确处理分隔符中的换行符。

由于使用了 gawk,处理更多任意分隔符的通用版本可能是:

split_byREorStr_gawk(){
    local delim=$1
    local string=$2
    local useRegex=${3:+1}  # if set, delimiter is regex

    readarray -td '' a < <(
        export delim
        gawk -v re="$useRegex" '
            BEGIN {
                RS = FS = "\0"
                ORS = ""
                d = ENVIRON["delim"]

                # cf. /sf/answers/2592739691/
                if (!re) gsub(/[\\.^$(){}\[\]|*+?]/,"\\\\&",d)
            }
            gsub(d"|\n$","\0")
        ' <<<"$string"
    )
    # echo $( declare -p a | md5sum; declare -p a )
}
Run Code Online (Sandbox Code Playgroud)

或 Perl 中的相同想法:

split_byREorStr_perl(){
    local delim=$1
    local string=$2
    local regex=$3  # if set, delimiter is regex

    readarray -td '' a < <(
        export delim regex
        perl -0777pe '
            $d = $ENV{delim};
            $d = "\Q$d\E" if ! $ENV{regex};
            s/$d|\n$/\0/g;
        ' <<<"$string"
    )
    # echo $( declare -p a | md5sum; declare -p a )
}
Run Code Online (Sandbox Code Playgroud)

这些实现产生相同的输出,通过分别比较 md5sum 进行测试。

请注意,如果输入不明确(正如 @bgoldst 所说的“逻辑上不正确”),行为会略有不同。例如,使用分隔符--和字符串a-a---

  • @goldst 的代码返回:declare -a a=([0]="a")declare -a a=([0]="a" [1]="")
  • 我的回报:declare -a a=([0]="a-")declare -a a=([0]="a" [1]="-")

参数是通过简单的 Perl 脚本得出的:

delim="-=-="
base="ABCDEFGHIJKLMNOPQRSTUVWXYZ012345"
Run Code Online (Sandbox Code Playgroud)

以下是 3 种不同类型的字符串和分隔符参数的计时结果表(以秒为单位)。

  • #s- 字符串参数的长度
  • #d- delim 参数的长度
  • =- 业绩收支平衡点
  • !-“可接受的”性能限制(bash)就在附近
  • !!-“可接受的”性能限制(gawk)就在附近
  • -- 函数运行时间太长
  • <!>- gawk 命令运行失败

类型1

d=$(perl -e "print( '$delim' x (7*2**$n) )")
s=$(perl -e "print( '$delim' x (7*2**$n) . '$base' x (7*2**$n) )")
Run Code Online (Sandbox Code Playgroud)
n #s #d 呆呆地 b_sub b_faster b_naive
0 第252章 28 0.002 0.000 0.000 0.000
1 504 56 0.005 0.000 0.000 0.001
2 1008 112 0.005 0.001 0.000 0.003
3 2016年 224 0.006 0.001 0.000 0.009
4 4032 第448章 0.007 0.002 0.001 0.048
= 5 8064 896 0.014 0.008 0.005 0.377
6 16128 1792年 0.018 0.029 0.017 (2.214)
7 32256 3584 0.033 0.057 0.039 (15.16)
8 64512 7168 0.063 0.214 0.128 -
9 129024 14336 0.111 (0.826) (0.602) -
10 258048 28672 0.214 (3.383) (2.652) -
!! 11 516096 57344 0.430 (13.46) (11:00) -
12 1032192 114688 (0.834) (58.38) - -
13 2064384 229376 <!> (228.9) - -

2型

d=$(perl -e "print( '$delim' x ($n) )")
s=$(perl -e "print( ('$delim' x ($n) . '$base' x $n ) x (2**($n-1)) )")
Run Code Online (Sandbox Code Playgroud)
n #s #d 呆呆地 b_sub b_faster b_naive
0 0 0 0.003 0.000 0.000 0.000
1 36 4 0.003 0.000 0.000 0.000
2 144 8 0.005 0.000 0.000 0.000
3 第432章 12 0.005 0.000 0.000 0.000
4 第1152章 16 0.005 0.001 0.001 0.002
5 2880 20 0.005 0.001 0.002 0.003
6 6912 24 0.006 0.003 0.009 0.014
= 7 16128 28 0.012 0.012 0.037 0.044
8 36864 32 0.023 0.044 0.167 0.187
9 82944 36 0.049 0.192 (0.753) (0.840)
10 184320 40 0.097 (0.925) (3.682) (4.016)
11 405504 44 0.204 (4.709) (18:00) (19.58)
!! 12 884736 48 0.444 (22.17) - -
13 1916928 52 (1.019) (102.4) - -

3型

d=$(perl -e "print( '$delim' x (2**($n-1)) )")
s=$(perl -e "print( ('$delim' x (2**($n-1)) . '$base' x (2**($n-1)) ) x ($n) )")
Run Code Online (Sandbox Code Playgroud)
n #s #d 呆呆地 b_sub b_faster b_naive
0 0 0 0.000 0.000 0.000 0.000
1 36 4 0.004 0.000 0.000 0.000
2 144 8 0.003 0.000 0.000 0.000
3 第432章 16 0.003 0.000 0.000 0.000
4 第1152章 32 0.005 0.001 0.001 0.002
5 2880 64 0.005 0.002 0.001 0.003
6 6912 128 0.006 0.003 0.003 0.014
= 7 16128 256 0.012 0.011 0.010 0.077
8 36864 第512章 0.023 0.046 0.046 (0.513)
9 82944 1024 0.049 0.195 0.197 (3.850)
10 184320 2048 0.103 (0.951) (1.061) (31.84)
11 405504 4096 0.222 (4.796) - -
!! 12 884736 8192 0.473 (22.88) - -
13 1916928 16384 (1.126) (105.4) - -

分隔符长度总结 1..10

由于短分隔符可能比长分隔符更有可能,下面总结了分隔符长度在 1 到 10 之间变化的结果(2..9 的结果大部分被省略,因为非常相似)。

s1=$(perl -e "print( '$d' . '$base' x (7*2**$n) )")
s2=$(perl -e "print( ('$d' . '$base' x $n ) x (2**($n-1)) )")
s3=$(perl -e "print( ('$d' . '$base' x (2**($n-1)) ) x ($n) )")
Run Code Online (Sandbox Code Playgroud)

bash_sub < 呆呆的

细绳 n #s #d 呆呆地 b_sub b_faster b_naive
s1 10 229377 1 0.131 0.089 1.709 -
s1 10 229386 10 0.142 0.095 1.907 -
s2 8 32896 1 0.022 0.007 0.148 0.168
s2 8 34048 10 0.021 0.021 0.163 0.179
s3 12 786444 1 0.436 0.468 - -
s3 12 786456 2 0.434 0.317 - -
s3 12 786552 10 0.438 0.333 - -

bash_sub < 0.5 秒

细绳 n #s #d 呆呆地 b_sub b_faster b_naive
s1 11 458753 1 0.256 0.332 (7.089) -
s1 11 458762 10 0.269 0.387 (8.003) -
s2 11 361472 1 0.205 0.283 (14.54) -
s2 11 363520 3 0.207 0.462 (16.66) -
s3 12 786444 1 0.436 0.468 - -
s3 12 786456 2 0.434 0.317 - -
s3 12 786552 10 0.438 0.333 - -

呆呆 < 0.5 秒

细绳 n #s $d 呆呆地 b_sub b_faster b_naive
s1 11 458753 1 0.256 0.332 (7.089) -
s1 11 458762 10 0.269 0.387 (8.003) -
s2 12 788480 1 0.440 (1.252) - -
s2 12 806912 10 0.449 (4.968) - -
s3 12 786444 1 0.436 0.468 - -
s3 12 786456 2 0.434 0.317 - -
s3 12 786552 10 0.438 0.333 - -

(我不完全确定为什么 s>160k 和 d=1 的 bash_sub 始终比 s3 的 d>1 慢。)

所有测试均在运行 xubuntu 20.04 的 Intel i7-7500U 上使用 bash 5.0.17 进行。


小智 13

如果你使用 macOS 并且不能使用 readarray,你可以简单地这样做 -

MY_STRING="string1 string2 string3"
array=($MY_STRING)
Run Code Online (Sandbox Code Playgroud)

迭代元素:

for element in "${array[@]}"
do
    echo $element
done
Run Code Online (Sandbox Code Playgroud)


小智 9

这类似于Jmoney38方法,但使用 sed:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
echo ${array[0]}
Run Code Online (Sandbox Code Playgroud)

打印 1

  • 这基本上只是抄袭了“tr”答案并使情况变得更糟。现在,更复杂的工具涉及更复杂的语法和正则表达式。此外,原始版本中的现代“$()”语法已被过时的反引号所取代。 (3认同)
  • 在我的例子中它打印 1 2 3 4 (2认同)

To *_*Kra 8

这在 OSX 上对我有用:

string="1 2 3 4 5"
declare -a array=($string)
Run Code Online (Sandbox Code Playgroud)

如果您的字符串具有不同的分隔符,只需将它们替换为空格:

string="1,2,3,4,5"
delimiter=","
declare -a array=($(echo $string | tr "$delimiter" " "))
Run Code Online (Sandbox Code Playgroud)

简单的 :-)

  • 适用于 Bash 和 Zsh,这是一个优点! (2认同)

daw*_*awg 5

将字符串拆分为数组的关键是的多字符定界符", "。使用IFS多字符定界符的任何解决方案本质上都是错误的,因为IFS是这些字符的集合,而不是字符串。

如果指定,IFS=", "则字符串将在EITHER ","OR " "或它们的任意组合上中断,而这不是的两个字符定界符的准确表示", "

您可以使用awksed分割字符串,并进行进程替换:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do   # use a NUL terminated field separator 
    array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output
Run Code Online (Sandbox Code Playgroud)

直接在Bash中使用正则表达式会更有效:

#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...
Run Code Online (Sandbox Code Playgroud)

使用第二种形式时,没有子外壳,并且本质上会更快。


bgoldst编辑:以下是一些基准,将我的readarray解决方案与dawg的正则表达式解决方案进行了比较,并且还包括了read针对该问题的解决方案(注意:我对正则表达式解决方案进行了少许修改,以使其与我的解决方案更加协调)(另请参见下面的评论)发布):

## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\  ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };

## helper functions
function rep {
    local -i i=-1;
    for ((i = 0; i<$1; ++i)); do
        printf %s "$2";
    done;
}; ## end rep()

function testAll {
    local funcs=();
    local args=();
    local func='';
    local -i rc=-1;
    while [[ "$1" != ':' ]]; do
        func="$1";
        if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
            echo "bad function name: $func" >&2;
            return 2;
        fi;
        funcs+=("$func");
        shift;
    done;
    shift;
    args=("$@");
    for func in "${funcs[@]}"; do
        echo -n "$func ";
        { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
        rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
    done| column -ts/;
}; ## end testAll()

function makeStringToSplit {
    local -i n=$1; ## number of fields
    if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
    if [[ $n -eq 0 ]]; then
        echo;
    elif [[ $n -eq 1 ]]; then
        echo 'first field';
    elif [[ "$n" -eq 2 ]]; then
        echo 'first field, last field';
    else
        echo "first field, $(rep $[$1-2] 'mid field, ')last field";
    fi;
}; ## end makeStringToSplit()

function testAll_splitIntoArray {
    local -i n=$1; ## number of fields in input string
    local s='';
    echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
    s="$(makeStringToSplit "$n")";
    testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()

## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.000s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.001s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray   real  0m0.069s   user 0m0.000s   sys  0m0.062s
## c_read        real  0m0.065s   user 0m0.000s   sys  0m0.046s
## c_regex       real  0m0.005s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray   real  0m0.084s   user 0m0.031s   sys  0m0.077s
## c_read        real  0m0.092s   user 0m0.031s   sys  0m0.046s
## c_regex       real  0m0.125s   user 0m0.125s   sys  0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray   real  0m0.209s   user 0m0.093s   sys  0m0.108s
## c_read        real  0m0.333s   user 0m0.234s   sys  0m0.109s
## c_regex       real  0m9.095s   user 0m9.078s   sys  0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray   real  0m1.460s   user 0m0.326s   sys  0m1.124s
## c_read        real  0m2.780s   user 0m1.686s   sys  0m1.092s
## c_regex       real  17m38.208s   user 15m16.359s   sys  2m19.375s
##
Run Code Online (Sandbox Code Playgroud)


MrP*_*ead 5

纯 bash 多字符分隔符解决方案。

正如其他人在此线程中指出的那样,OP 的问题给出了一个以逗号分隔的字符串被解析为数组的示例,但没有表明他/她是否只对逗号分隔符、单字符分隔符或多字符感兴趣分隔符。

由于谷歌倾向于将这个答案排在搜索结果的顶部或附近,我想为读者提供一个关于多个字符分隔符问题的强有力的答案,因为至少在一个回复中也提到了这一点。

如果您正在寻找多字符分隔符问题的解决方案,我建议您查看Mallikarjun M的帖子,特别是gniourf_gniourf的回复, 他使用参数扩展提供了这个优雅的纯 BASH 解决方案:

#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array
Run Code Online (Sandbox Code Playgroud)

链接到引用的评论/引用的帖子

引用问题的链接:Howto split a string on a multi-character delimiter in bash?