Friday, December 12, 2025

.NET regular expressions that break Perl Compatible Regular Expressions (PCRE) standard


Examples of patterns that work in PCRE but break in .NET

The Perl Compatible Regular Expressions (PCRE) library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API.
The current version of PCRE2 is version 10.47 and released Oct 21, 2025.


Good news a clever programmer created a dotnet wrapper for the PCRE2 library used in Perl supporting 10.47https://github.com/ltrzesniewski/pcre-net/, 1st version v0.1.0 requires .NET 4.5 Framework.


Pattern Type PCRE        .NET        Why It Breaks
Possessive quantifiers❌ Not implemented
Variable‑length lookbehind.NET requires fixed length
Subroutine calls / recursionFeature not supported
Python‑style named groups.NET uses different syntax
Ungreedy mode ((?U))No equivalent in .NET
Recursive named groupsNot implemented
Modern Unicode properties.NET uses older Unicode
\R newline escape⚠️Only supported in newer .NET
Lookbehind with alternationVariable length


1. Possessive quantifiers (++, *+, ?+, {m,n}+)

PCRE supports:

^\w++$

Meaning: match a word string with no backtracking allowed.

✅ PCRE: Works ❌ .NET: Throws “quantifier following nothing” or treats ++ as literal + depending on context.

Why it breaks: .NET simply does not implement possessive quantifiers.


2. Variable‑length lookbehind

PCRE allows:

(?<=\w+)\d+

Meaning: match digits preceded by one or more word characters.

✅ PCRE: Works ❌ .NET: “Lookbehind assertion is not fixed length”

Why it breaks: .NET requires lookbehind to have a fixed, compile‑time length.


3. Subroutine calls / recursion

PCRE supports recursive patterns:

^(\w+|(\((?1)\)))+$

Used for matching nested parentheses.

✅ PCRE: Works ❌ .NET: “Unrecognized grouping construct”

Why it breaks: .NET does not support (?1), (?&name), or recursive patterns at all.


4. Python‑style named groups ((?P<name>...))

PCRE supports:

(?P<word>\w+)-(?P=word)

✅ PCRE: Works ❌ .NET: “Unrecognized grouping construct”

Why it breaks: .NET only supports (?<name>...) and (?'name'...).


5. Ungreedy mode modifier ((?U))

PCRE supports:

(?U)^.+$

Meaning: make all quantifiers lazy by default.

✅ PCRE: Works ❌ .NET: “Unrecognized inline modifier”

Why it breaks: .NET has no equivalent to PCRE’s global ungreedy mode.


6. Recursive named groups ((?&name))

PCRE:

(?<paren>\((?:[^()]++|(?&paren))*\))

✅ PCRE: Works ❌ .NET: Fails immediately

Why it breaks: .NET has no syntax for recursive named groups.


7. Newline escape \R in older .NET versions

PCRE:

^\R$

✅ PCRE: Works ❌ .NET (pre‑.NET 7): “Unrecognized escape sequence”

Why it breaks: .NET added \R only recently.


8. Lookbehind with alternation of different lengths

(?<=abc|z)\d+

✅ PCRE: Works ❌ .NET: “Lookbehind assertion is not fixed length”

Why it breaks: Alternation creates variable‑length lookbehind.


9. Unicode property escapes beyond Unicode 4.0.1

PCRE supports modern Unicode categories:

\p{Script=Greek}

✅ PCRE: Works ❌ .NET: “Unknown property”

Why it breaks: .NET’s regex engine is tied to older Unicode property tables.


No comments:

Post a Comment