regex to find a pair of adjacent digits with different digits around them

I have divided my answer into four sections.

The first section contains my solution to the problem. Readers interested in nothing else may skip the other sections.

The remaining three sections are concerned with identifying the pairs of equal digits that are preceded by a different digit and are followed by a different digit. The first of the three sections matches them; the other two capture them in a group.

I’ve included the last section because I wanted to share The Greatest Regex Trick Ever with those unfamiliar with it, because I find it so very cool and clever, yet simple. It is documented here. Be forewarned that, to build suspense, the author at that link has included a lengthy preamble before the drum-roll reveal.

Determine if a string contains two consecutive equal digits that are preceded by a different digit and are followed by a different digit

You can test the string as follows:

import re

r = r'(\d)(?!\1)(\d)\2(?!\2)\d'
arr = ["123456678", "1123455a666788"]
for s in arr:
  print(s, bool(re.search(r, s)) )

displays

123456678 True
1123455a666788 False

Run Python code | Start your engine!1

The regex engine performs the following operations.

(\d)    : match a digit and save to capture group 1 (preceding digit)
(?!\1)  : next character cannot equal content of capture group 1
(\d)    : match a digit in capture group 2 (first digit of pair)
\2      : match content of capture group 2 (second digit of pair)
(?!\2)  : next character cannot equal content of capture group 2
\d      : match a digit

(?!\1) and (?!\2) are negative lookaheads.

Use Python’s regex module to match pairs of consecutive digits that have the desired property

You can use the following regular expression with Python’s regex module to obtain the matching pairs of digits.

r'(\d)(?!\1)\K(\d)\2(?=\d)(?!\2)'

Regex Engine

The regex engine performs the following operations.

(\d)    : match a digit and save to capture group 1 (preceding digit)
(?!\1)  : next character cannot equal content of capture group 1
\K      : forget everything matched so far and reset start of match
(\d)    : match a digit in capture group 2 (first digit of pair)
\2      : match content of capture group 2 (second digit of pair)
(?=\d)  : next character must be a digit
(?!\2)  : next character cannot equal content of capture group 2

(?=\d) is a positive lookahead. (?=\d)(?!\2) could be replaced with (?!\2|$|\D).

Save pairs of consecutive digits that have the desired property to a capture group

Another way to obtain the matching pairs of digits, which does not require the regex module, is to extract the contents of capture group 2 from matches of the following regular expression.

r'(\d)(?!\1)((\d)\3)(?!\3)(?=\d)'

Re engine

The following operations are performed.

(\d)    : match a digit in capture group 1
(?!\1)  : next character does not equal last character
(       : begin capture group 2
  (\d)  : match a digit in capture group 3
  \3    : match the content of capture group 3
)       : end capture group 2
(?!\3)  : next character does not equal last character
(?=\d)  : next character is a digit

Use The Greatest Regex Trick Ever to identify pairs of consecutive digits that have the desired property

We use the following regular expression to match the string.

r'(\d)(?=\1)|\d(?=(\d)(?!\2))|\d(?=\d(\d)\3)|\d(?=(\d{2})\d)'

When there is a match, we pay no attention to which character was matched, but examine the content of capture group 4 ((\d{2})), as I will explain below.

The Trick in action

The first three components of the alternation correspond to the ways that a string of four digits can fail to have the property that the second and third digits are equal, the first and second are unequal and the third and fourth are equal. They are:

(\d)(?=\1)        : assert first and second digits are equal    
\d(?=(\d)(?!\2))  : assert second and third digits are not equal
\d(?=\d(\d)\3)    : assert third and fourth digits are equal

It follows that if there is a match of a digit and the first three parts of the alternation fail the last part (\d(?=(\d{2})\d)) must succeed, and the capture group it contains (#4) must contain the two equal digits that have the required properties. (The final \d is needed to assert that the pair of digits of interest is followed by a digit.)

If there is a match how do we determine if the last part of the alternation is the one that is matched?

When this regex matches a digit we have no interest in what digit that was. Instead, we look to capture group 4 ((\d{2})). If that group is empty we conclude that one of the first three components of the alternation matched the digit, meaning that the two digits following the matched digit do not have the properties that they are equal and are unequal to the digits that precede and follow them.

If, however, capture group 4 is not empty, it means that none of the first three parts of the alternation matched the digit, so the last part of the alternation must have matched and the two digits following the matched digit, which are held in capture group 4, have the desired properties.

1. Move the cursor around for detailed explanations.

Leave a Comment