Friday, April 3, 2026

Unicode Regex Expressions Cheat Sheet 2026


Unicode Regex Expressions Cheat Sheet


Regex cheat sheets don't address Unicode; this specifically summarizes the most useful parts. The Notes section links to the actual characters represented by the 'Property' named alias. 


Unicode Regex Syntax

\p{xx}
a character with the Unicode property alias, see below
\P{xx}
a character without Unicode property alias, see below
\x as "hex"
Hexadecimal Escape. Used to match a specific character by its hex code. Usually followed by two digits (\xHH) or braces in some engines (\x{HHHH}). 
\x41 matches the letter A.
\X as "eXtended"
Unicode Grapheme Cluster. Matches a "user-perceived character," which includes a base character plus any combining marks (like accents).

Why \X is different

In the Unicode world, some "characters" are actually multiple code points combined. For example, the emoji 👨‍👩‍👧 is one "human-perceived character" but is made of several individual code points.

  • . (the dot) might only match the first part of that emoji.

  • \X will match the entire sequence as one unit.


The property names represented by xx above are limited to the Unicode general category properties. Each character has exactly one such property, specified by a two-letter abbreviation. For compatibility with Perl, negation can be specified by including a circumflex between the opening brace and the property name. For example, \p{^Lu} is the same as \P{Lu}.

If only one letter is specified with \p or \P, it includes all the properties that start with that letter. In this case, in the absence of negation, the curly brackets in the escape sequence are optional; these two examples have the same effect:

\p{L}
\pL
Supported character property codes
PropertyMatchesNotes
COtherIncludes the following properties: CcCf, Cn, Co and Cs.

CcControlUnicode Characters in the Control Category (unicodeplus.com)
CfFormatUnicode Characters in the Format Category (unicodeplus.com)
CnUnassigned none
CoPrivate use none
CsSurrogate none
LLetterIncludes the following properties: LlLmLoLt and Lu. Get it.

LlLower case letterUnicode Characters in the Lowercase Letter Category (unicodeplus.com)
LmModifier letterUnicode Characters in the Modifier Letter Category (unicodeplus.com) 
LoOther letterUnicode Characters in the Other Letter Category (unicodeplus.com) 
LtTitle case letterUnicode Characters in the Titlecase Letter Category (unicodeplus.com) 
LuUpper case letterUnicode Characters in the Uppercase Letter Category (unicodeplus.com)
MMark 

McSpacing markUnicode Characters in the Spacing Mark Category (unicodeplus.com) 
MeEnclosing markUnicode Characters in the Enclosing Mark Category (unicodeplus.com) 
MnNon-spacing markUnicode Characters in the Nonspacing Mark Category (unicodeplus.com)
NNumber 

NdDecimal numberUnicode Characters in the Decimal Number Category (unicodeplus.com) 
NlLetter numberUnicode Characters in the Letter Number Category (unicodeplus.com)
NoOther numberUnicode Characters in the Other Number Category (unicodeplus.com) 
PPunctuation 

PcConnector punctuationUnicode Characters in the Connector Punctuation Category (unicodeplus.com) 
PdDash punctuationUnicode Characters in the Dash Punctuation Category (unicodeplus.com) 
PeClose punctuationUnicode Characters in the Close Punctuation Category (unicodeplus.com)
PfFinal punctuationUnicode Characters in the Final Punctuation Category (unicodeplus.com) 
PiInitial punctuationUnicode Characters in the Initial Punctuation Category (unicodeplus.com) 
PoOther punctuationUnicode Characters in the Open Punctuation Category (unicodeplus.com)
PsOpen punctuationUnicode Characters in the Open Punctuation Category (unicodeplus.com) 
SSymbol 

ScCurrency symbolUnicode Characters in the Currency Symbol Category (unicodeplus.com) 
SkModifier symbolUnicode Characters in the Modifier Symbol Category (unicodeplus.com) 
SmMathematical symbolUnicode Characters in the Math Symbol Category (unicodeplus.com) 
SoOther symbolUnicode Characters in the Other Symbol Category (unicodeplus.com) - Includes emojis
ZSeparator

ZlLine separatorUnicode Characters in the Line Separator Category (unicodeplus.com) - only 1 character
ZpParagraph separatorUnicode Characters in the Paragraph Separator Category (unicodeplus.com) - only 1 character
ZsSpace separatorUnicode Characters in the Space Separator Category (unicodeplus.com) 



























MS .NET Regex Cheat Sheet

For detailed information and examples, see http://aka.ms/regex
Test at
http://regexlib.com/RETester.aspx

Or test  using 💻 Launch Netspresso Lite (scroll to bottom, highlights substitutions!)


Single characters

Use         

To match any character 

[set] 

In that set

[^set] 

Not in that set

[a-z] 

In the a-z range

[a-z] 

Not in the a-z range

. 

Any except \n (new line)

[a-z] 

Escaped special character

 

Control characters 

Use         

To match 

Unicode 

\t 

Horizontal tab 

\u0009 

\v 

Vertical tab 

\u000B 

\b 

Backspace 

\u0008 

\e 

Escape 

\u001B 

\r 

Carriage return 

\u000D 

\f 

Form feed 

\u000C 

\n 

New line 

\u000A 

\a 

Bell (alarm) 

\u0007 

\c char

ASCII control character 


Non-ASCII codes
 

Use                     

To match  character = with

\octalO:P=> 

2-3 digit octal character code

\x hex  

2-digit hex character code

\u hex  

4-digit hex character code


Character classes 
 

Use                     

To match character 

\p{category}

In that Unicode category or block

\P{category}

Not in that Unicode category or block

\w 

Word character

\W 

Non-word character

\d 

Decimal digit

\D 

Not a decimal digit

\s 

White-space character

\S 

Non-white-space char

 

Quantifiers  

Greedy        

Lazy             

Matches 

* 

*? 

0 or more times

+ 

+? 

1 or more times

? 

?? 

0 or 1 time

{n} 

{n}?=  

Exactly n times

{n,} 

{n,}? 

At least n times

{n,m} 

{n,m}?

From n to m times

 

Anchors 

Use 

To specify position 

^           

At start of string or line

\A 

At start of string

\z 

At end of string

\Z 

At end (or before \n at end) of string

$ 

At end (or before \n at end) of string or line

\G 

Where previous match ended

\b 

On word boundary

\B 

Not on word boundary


Groups 

Use                                   

To define 

(exp)=  

Indexed group

(?<name>exp)

Named group

(?<name1-name2>exp)           

Balancing group

(?:exp)=  

Non-capturing group  

(?=exp)=  

Zero-width positive look-ahead 

(?!exp)=  

Zero-width negative look-ahead 

(?<=exp)=  

Zero-width positive look-behind 

(?<!exp)=  

Zero-width negative look-behind 

(?>exp)=  

Non-backtracking (greedy)

 

Inline options 

Option 

Effect on match 

i  

Case-insensitive

m 

Multiline mode

n 

Explicit (named)

s 

Single-line mode

x 

Ignore white space

Inline options .NET special instruction

Use                               

To 

(?imnsx-imnsx) 

Set or disable the specified options

(?imnsx-imnsx:exp)

Set or disable the specified options within the expression


Back References
 

Use                    

To match 

\n  

Indexed group

\k<name> 

Named group


Alternation
 

Use                                

To match 

a |b  

Either a or b 

(?(exp) yes | no)

yes if exp is matched
no if exp isn't matched

(?(name) yes | no)

yes if name is matched
no if name isn't matched

 

Substitution  

Use                

To substitute 

$n 

Substring matched by group number n 

${name}

Substring matched by group name 

$$ 

Literal $ character

$& 

Copy of whole match

$` 

Text before the match

$' 

Text after the match

$+ 

Last captured group

$_ 

Entire input string

 

Comments  

Use                        

To 

(?# comment) 

Add inline comment

# 

Add x-mode comment

 

Supported Unicode Categories 

Category          

Description

Lu 

Letter, uppercase

LI 

Letter, lowercase  

Lt 

Letter, title case  

Lm 

Letter, modifier 

Lo 

Letter, other 

L 

Letter, all  

Mn

Mark, non-spacing combining

Mc  

Mark, spacing combining 

Me 

Mark, enclosing combining 

M  

Mark, all diacritic  

Nd

Number, decimal digit 

Nl

Number, letter-like

No 

Number, other 

N  

Number, all 

Pc  

Punctuation, connector 

Pd  

Punctuation, dash 

Ps 

Punctuation, opening mark 

Pe  =

Punctuation, closing mark 

Pi  

Punctuation, initial quote mark 

Pf 

Punctuation, final quote mark

Po 

Punctuation, other 

P 

Punctuation, all 

Sm 

Symbol, math 

Sc 

Symbol, currency 

Sk

Symbol, modifier 

So 

Symbol, other 

S 

Symbol, all 

Zs  

Separator, space 

Zl

Separator, line 

Zp

Separator, paragraph 

Z 

Separator, all 

Cc  

Control code 

Cf

Format control character 

Cs 

Surrogate code point 

Co 

Private-use character 

Cn 

Unassigned 

C 

Control characters, all

For named character set blocks (e.= g., Cyrillic), search for "supported named blocks" in the MSDN  Library