如何使用正则表达式在句点后添加缺失的空格,而不更改小数位

Ido*_*odo 7 python regex

我有一大段文本在某些句号后缺少空格。然而文本也包含十进制数字。

这是到目前为止我使用正则表达式解决问题的方法(我使用的是 python):

re.sub(r"(?!\d\.\d)(?!\. )\.", '. ', my_string)

但第一批逃亡团似乎不起作用。它仍然匹配十进制数字中的句点。

以下是示例文本,以确保任何潜在的解决方案都有效:

this is a.match
this should also match.1234
and this should 123.match

this should NOT match. Has space after period
this also should NOT match 1.23
Run Code Online (Sandbox Code Playgroud)

Wik*_*żew 1

您可以使用

re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text)
Run Code Online (Sandbox Code Playgroud)

请参阅正则表达式演示。尾随空格是可选匹配的,因此如果存在,它将被删除并放回。

细节

  • \.- 一个点
  • (?!(?<=\d\.)\d)- 如果前面的点是两个数字之间的点,则不再匹配
  • ?- 可选空间。

查看Python 演示

import re
text = "this is a.match\nthis should also match.1234\nand this should 123.match\n\nthis should NOT match. Has space after period\nthis also should NOT match 1.23"
print(re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text))
Run Code Online (Sandbox Code Playgroud)

输出:

this is a. match
this should also match. 1234
and this should 123. match

this should NOT match. Has space after period
this also should NOT match 1.23
Run Code Online (Sandbox Code Playgroud)

(?! )或者,在您的尝试中使用前瞻:

this is a. match
this should also match. 1234
and this should 123. match

this should NOT match. Has space after period
this also should NOT match 1.23
Run Code Online (Sandbox Code Playgroud)

请参阅正则表达式演示Python 演示