正则表达式找到一对相邻的数字,周围有不同的数字

Arc*_*ams 57 python regex python-3.x

我是正则表达式的初学者,我正在尝试创建一个表达式来查找是否有两个相同的数字彼此相邻,并且该对后面和前面的数字不同。

例如,

123456678 应该匹配,因为有一个双 6,

1234566678 不应该匹配,因为没有具有不同周围数字的双精度数。12334566 应该匹配,因为有两个 3。

到目前为止,我有这个只适用于 1,只要双精度不在字符串的开头或结尾,但是我可以通过在开头和结尾添加一个字母来解决这个问题。

^.*([^1]11[^1]).*$
Run Code Online (Sandbox Code Playgroud)

我知道我可以用 1[0-9]代替,但问题是让它们都是同一个数字。

谢谢!

Wik*_*żew 34

使用正则表达式,使用基于模式的 PyPiregex模块要方便得多(*SKIP)(*FAIL)

import regex
rx = r'(\d)\1{2,}(*SKIP)(*F)|(\d)\2'
l = ["123456678", "1234566678"]
for s in l:
  print(s, bool(regex.search(rx, s)) )
Run Code Online (Sandbox Code Playgroud)

请参阅Python 演示。输出:

123456678 True
1234566678 False
Run Code Online (Sandbox Code Playgroud)

正则表达式详情

  • (\d)\1{2,}(*SKIP)(*F) - 一个数字,然后出现两次或多次相同的数字
  • | - 或者
  • (\d)\2 - 一个数字,然后是同一个数字。

重点是匹配所有相同的3个或更多数字的块并跳过它们,然后匹配两个相同数字的块。

请参阅正则表达式演示


Car*_*and 33

我将我的答案分为四个部分。

第一部分包含我对问题的解决方案。对其他内容不感兴趣的读者可以跳过其他部分。

The remaining three sections are concerned with identifying the pairs of equal digits that are preceded by a different digit and are followed by a different digit. The first of the three sections matches them; the other two capture them in a group.

I've included the last section because I wanted to share The Greatest Regex Trick Ever with those unfamiliar with it, because I find it so very cool and clever, yet simple. It is documented here. Be forewarned that, to build suspense, the author at that link has included a lengthy preamble before the drum-roll reveal.

Determine if a string contains two consecutive equal digits that are preceded by a different digit and are followed by a different digit

You can test the string as follows:

import re

r = r'(\d)(?!\1)(\d)\2(?!\2)\d'
arr = ["123456678", "1123455a666788"]
for s in arr:
  print(s, bool(re.search(r, s)) )
Run Code Online (Sandbox Code Playgroud)

displays

123456678 True
1123455a666788 False
Run Code Online (Sandbox Code Playgroud)

Run Python code | Start your engine!1

The regex engine performs the following operations.

(\d)    : match a digit and save to capture group 1 (preceding digit)
(?!\1)  : next character cannot equal content of capture group 1
(\d)    : match a digit in capture group 2 (first digit of pair)
\2      : match content of capture group 2 (second digit of pair)
(?!\2)  : next character cannot equal content of capture group 2
\d      : match a digit
Run Code Online (Sandbox Code Playgroud)

(?!\1) and (?!\2) are negative lookaheads.

Use Python's regex module to match pairs of consecutive digits that have the desired property

You can use the following regular expression with Python’s regex module to obtain the matching pairs of digits.

r'(\d)(?!\1)\K(\d)\2(?=\d)(?!\2)'
Run Code Online (Sandbox Code Playgroud)

Regex Engine

The regex engine performs the following operations.

(\d)    : match a digit and save to capture group 1 (preceding digit)
(?!\1)  : next character cannot equal content of capture group 1
\K      : forget everything matched so far and reset start of match
(\d)    : match a digit in capture group 2 (first digit of pair)
\2      : match content of capture group 2 (second digit of pair)
(?=\d)  : next character must be a digit
(?!\2)  : next character cannot equal content of capture group 2
Run Code Online (Sandbox Code Playgroud)

(?=\d) is a positive lookahead. (?=\d)(?!\2) could be replaced with (?!\2|$|\D).

Save pairs of consecutive digits that have the desired property to a capture group

Another way to obtain the matching pairs of digits, which does not require the regex module, is to extract the contents of capture group 2 from matches of the following regular expression.

r'(\d)(?!\1)((\d)\3)(?!\3)(?=\d)'
Run Code Online (Sandbox Code Playgroud)

Re engine

The following operations are performed.

(\d)    : match a digit in capture group 1
(?!\1)  : next character does not equal last character
(       : begin capture group 2
  (\d)  : match a digit in capture group 3
  \3    : match the content of capture group 3
)       : end capture group 2
(?!\3)  : next character does not equal last character
(?=\d)  : next character is a digit
Run Code Online (Sandbox Code Playgroud)

Use The Greatest Regex Trick Ever to identify pairs of consecutive digits that have the desired property

We use the following regular expression to match the string.

r'(\d)(?=\1)|\d(?=(\d)(?!\2))|\d(?=\d(\d)\3)|\d(?=(\d{2})\d)'
Run Code Online (Sandbox Code Playgroud)

When there is a match, we pay no attention to which character was matched, but examine the content of capture group 4 ((\d{2})), as I will explain below.

The Trick in action

The first three components of the alternation correspond to the ways that a string of four digits can fail to have the property that the second and third digits are equal, the first and second are unequal and the third and fourth are equal. They are:

(\d)(?=\1)        : assert first and second digits are equal    
\d(?=(\d)(?!\2))  : assert second and third digits are not equal
\d(?=\d(\d)\3)    : assert third and fourth digits are equal
Run Code Online (Sandbox Code Playgroud)

It follows that if there is a match of a digit and the first three parts of the alternation fail the last part (\d(?=(\d{2})\d)) must succeed, and the capture group it contains (#4) must contain the two equal digits that have the required properties. (The final \d is needed to assert that the pair of digits of interest is followed by a digit.)

If there is a match how do we determine if the last part of the alternation is the one that is matched?

当这个正则表达式匹配一个数字时,我们对那个数字不感兴趣。相反,我们希望捕获组 4 ( (\d{2}))。如果该组为空,我们得出结论,交替的前三个组成部分之一与数字匹配,这意味着匹配数字后面的两个数字不具有它们相等且不等于它们前后的数字的属性.

但是,如果捕获组 4 不为空,则意味着交替的前三个部分都不匹配该数字,因此交替的最后一部分必须匹配,并且匹配数字后面的两个数字保存在捕获组 4,具有所需的属性。

1. 左右移动光标查看详细说明。


The*_*ird 12

受答案或 Wiktor Stribi?ew 的启发,使用交替的另一种变体re是检查捕获组的存在,该组包含两个相同数字的正匹配,但未被相同数字包围。

在这种情况下,检查组 3。

((\d)\2{2,})|\d(\d)\3(?!\3)\d
Run Code Online (Sandbox Code Playgroud)

正则表达式演示| Python 演示

  • (捕获组 1
    • (\d)\2{2,}捕获第 2 组,匹配 1 个数字并重复同一数字 2 次以上
  • ) 关闭群组
  • | 或者
  • \d(\d)匹配一个数字,捕捉第 3 组中的一个数字
  • \3(?!\3)\d匹配与3 组相同的数字。匹配第 4数字,但不应与第 3 组数字相同

例如

import re

pattern = r"((\d)\2{2,})|\d(\d)\3(?!\3)\d"
strings = ["123456678", "12334566", "12345654554888", "1221", "1234566678", "1222", "2221", "66", "122", "221", "111"]

for s in strings:
    match = re.search(pattern, s)
    if match and match.group(3):
        print ("Match: " + match.string)
    else:
        print ("No match: " + s)
Run Code Online (Sandbox Code Playgroud)

输出

Match: 123456678
Match: 12334566
Match: 12345654554888
Match: 1221
No match: 1234566678
No match: 1222
No match: 2221
No match: 66
No match: 122
No match: 221
No match: 111
Run Code Online (Sandbox Code Playgroud)

例如,如果只有 2 或 3 位数字也可以匹配,您可以检查组 2

(\d)\1{2,}|(\d)\2
Run Code Online (Sandbox Code Playgroud)

Python 演示


vks*_*vks 5

你也可以用一个简单的方法。

import re
l=["123456678",
"1234566678",
"12334566 "]
for i in l:
    matches = re.findall(r"((.)\2+)", i)
    if any(len(x[0])!=2 for x in matches):
        print "{}-->{}".format(i, False)
    else:
        print "{}-->{}".format(i, True)
Run Code Online (Sandbox Code Playgroud)

您可以根据您的规则对此进行自定义。

输出:

123456678-->True
1234566678-->False
12334566 -->True
Run Code Online (Sandbox Code Playgroud)