re.split(" ", string) 和 re.split("\s+", string) 的区别？

Question

我目前正在研究正则表达式并遇到了一个查询。所以问题的标题是我想要找出的。我认为因为\s代表一个空白，re.split(" ", string)并且re.split("\s+", string)会给出相同的值，如下所示：

>>> import re
>>> a = re.split(" ", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]

>>> import re
>>> a = re.split("\s+", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]

这两个给出了相同的答案，所以我认为它们是同一回事。然而，事实证明这些是不同的。在什么情况下会有所不同？我在这里错过了什么让我失明？

Answer 1

根据您的示例，这看起来很相似。

在' '（单个空格）上进行拆分就是这样做的 - 它在单个空格上拆分。拆分时，连续的空格将导致空的“匹配项”。

拆分'\s+'也将拆分这些字符的多次出现，它包括其他空格，然后是“纯空格”：

import re

a = re.split(" ", "Why    is this  \t \t  wrong")
b = re.split("\s+", "Why    is this  \t \t  wrong")

print(a)
print(b)

输出：

# re.split(" ",data)
['Why', '', '', '', 'is', 'this', '', '\t', '\t', '', 'wrong']

# re.split("\s+",data)
['Why', 'is', 'this', 'wrong']

文档：

\s
匹配任何空白字符；这相当于类[ \t\n\r\f\v]。（https://docs.python.org/3/howto/regex.html#matching-characters）