什么正则表达式将模拟python中split()的默认行为？

Question

使用split()我可以轻松地从字符串创建按空格划分的标记列表:

>>> 'this is a test 200/2002'.split()
['this', 'is', 'a', 'test', '200/2002']

如何使用re.compile和re.findall执行相同的操作？我需要与以下示例类似的内容,但不要拆分"200/2002".

>>> test = re.compile('\w+')
>>> test.findall('this is a test 200/2002')
['this', 'is', 'a', 'test', '200', '2002']

Answer 1

这应该输出所需的列表:

>>> test = re.compile('\S+')
>>> test.findall('this is a test 200/2002')
['this', 'is', 'a', 'test', '200/2002']

\S 不是空格(空格,制表符,换行符......).

来自str.split() 文档:

如果未指定sep或为None,则应用不同的拆分算法:连续空格的运行被视为单个分隔符,如果字符串具有前导或尾随空格,则结果将在开头或结尾处不包含空字符串.因此,将空字符串或仅由空格组成的字符串拆分为None分隔符将返回[].

findall() 与上述正则表达式应该具有相同的行为:

>>> test.findall(" a\nb\tc   d ")
['a', 'b', 'c', 'd']
>>> " a\nb\tc   d ".split()
['a', 'b', 'c', 'd']