Unicode Regex Expressions Cheat Sheet

Regex cheat sheets don't address Unicode; this specifically summarizes the most useful parts. The Notes section links to the actual characters represented by the 'Property' named alias.

Unicode Regex Syntax

\p{xx}

a character with the Unicode property alias, see below

\P{xx}

a character without Unicode property alias, see below

\x as "hex"

Hexadecimal Escape. Used to match a specific character by its hex code. Usually followed by two digits (\xHH) or braces in some engines (\x{HHHH}).
\x41 matches the letter A.

\X as "eXtended"

Unicode Grapheme Cluster. Matches a "user-perceived character," which includes a base character plus any combining marks (like accents).

Why `\X` is different

In the Unicode world, some "characters" are actually multiple code points combined. For example, the emoji 👨‍👩‍👧 is one "human-perceived character" but is made of several individual code points.

. (the dot) might only match the first part of that emoji.
\X will match the entire sequence as one unit.

The property names represented by xx above are limited to the Unicode general category properties. Each character has exactly one such property, specified by a two-letter abbreviation. For compatibility with Perl, negation can be specified by including a circumflex between the opening brace and the property name. For example, \p{^Lu} is the same as \P{Lu}.

If only one letter is specified with \p or \P, it includes all the properties that start with that letter. In this case, in the absence of negation, the curly brackets in the escape sequence are optional; these two examples have the same effect:

\p{L}
\pL

Supported character property codes
Property	Matches	Notes
`C`	Other	Includes the following properties: `Cc`, `Cf`, Cn, `Co` and `Cs`.
`Cc`	Control	Unicode Characters in the Control Category (unicodeplus.com)
`Cf`	Format	Unicode Characters in the Format Category (unicodeplus.com)
`Cn`	Unassigned	none
`Co`	Private use	none
`Cs`	Surrogate	none
`L`	Letter	Includes the following properties: `Ll`, `Lm`, `Lo`, `Lt` and `Lu`. Get it.
`Ll`	Lower case letter	Unicode Characters in the Lowercase Letter Category (unicodeplus.com)
`Lm`	Modifier letter	Unicode Characters in the Modifier Letter Category (unicodeplus.com)
`Lo`	Other letter	Unicode Characters in the Other Letter Category (unicodeplus.com)
`Lt`	Title case letter	Unicode Characters in the Titlecase Letter Category (unicodeplus.com)
`Lu`	Upper case letter	Unicode Characters in the Uppercase Letter Category (unicodeplus.com)
`M`	Mark
`Mc`	Spacing mark	Unicode Characters in the Spacing Mark Category (unicodeplus.com)
`Me`	Enclosing mark	Unicode Characters in the Enclosing Mark Category (unicodeplus.com)
`Mn`	Non-spacing mark	Unicode Characters in the Nonspacing Mark Category (unicodeplus.com)
`N`	Number
`Nd`	Decimal number	Unicode Characters in the Decimal Number Category (unicodeplus.com)
`Nl`	Letter number	Unicode Characters in the Letter Number Category (unicodeplus.com)
`No`	Other number	Unicode Characters in the Other Number Category (unicodeplus.com)
`P`	Punctuation
`Pc`	Connector punctuation	Unicode Characters in the Connector Punctuation Category (unicodeplus.com)
`Pd`	Dash punctuation	Unicode Characters in the Dash Punctuation Category (unicodeplus.com)
`Pe`	Close punctuation	Unicode Characters in the Close Punctuation Category (unicodeplus.com)
`Pf`	Final punctuation	Unicode Characters in the Final Punctuation Category (unicodeplus.com)
`Pi`	Initial punctuation	Unicode Characters in the Initial Punctuation Category (unicodeplus.com)
`Po`	Other punctuation	Unicode Characters in the Open Punctuation Category (unicodeplus.com)
`Ps`	Open punctuation	Unicode Characters in the Open Punctuation Category (unicodeplus.com)
`S`	Symbol
`Sc`	Currency symbol	Unicode Characters in the Currency Symbol Category (unicodeplus.com)
`Sk`	Modifier symbol	Unicode Characters in the Modifier Symbol Category (unicodeplus.com)
`Sm`	Mathematical symbol	Unicode Characters in the Math Symbol Category (unicodeplus.com)
`So`	Other symbol	Unicode Characters in the Other Symbol Category (unicodeplus.com) - Includes emojis
`Z`	Separator
`Zl`	Line separator	Unicode Characters in the Line Separator Category (unicodeplus.com) - only 1 character
`Zp`	Paragraph separator	Unicode Characters in the Paragraph Separator Category (unicodeplus.com) - only 1 character
`Zs`	Space separator	Unicode Characters in the Space Separator Category (unicodeplus.com)

Download Microsoft .NET Cheat Sheet PDF

MS .NET Regex Cheat Sheet

For detailed information and examples, see http://aka.ms/regex
Test at http://regexlib.com/RETester.aspx

Or test using 💻 Launch Netspresso Lite (scroll to bottom, highlights substitutions!)

Single characters

Use	To match any character
[set]	In that set
[^set]	Not in that set
[a-z]	In the a-z range
[a-z]	Not in the a-z range
.	Any except \n (new line)
[a-z]	Escaped special character

Control characters

Use	To match	Unicode
\t	Horizontal tab	\u0009
\v	Vertical tab	\u000B
\b	Backspace	\u0008
\e	Escape	\u001B
\r	Carriage return	\u000D
\f	Form feed	\u000C
\n	New line	\u000A
\a	Bell (alarm)	\u0007
\c char	ASCII control character

Non-ASCII codes

Use	To match character = with
\octalO:P=>	2-3 digit octal character code
\x hex	2-digit hex character code
\u hex	4-digit hex character code

Character classes

Use	To match character
\p{category}	In that Unicode category or block
\P{category}	Not in that Unicode category or block
\w	Word character
\W	Non-word character
\d	Decimal digit
\D	Not a decimal digit
\s	White-space character
\S	Non-white-space char

Quantifiers

Greedy	Lazy	Matches
*	*?	0 or more times
+	+?	1 or more times
?	??	0 or 1 time
{n}	{n}?=	Exactly n times
{n,}	{n,}?	At least n times
{n,m}	{n,m}?	From n to m times

Anchors

Use	To specify position
^	At start of string or line
\A	At start of string
\z	At end of string
\Z	At end (or before \n at end) of string
$	At end (or before \n at end) of string or line
\G	Where previous match ended
\b	On word boundary
\B	Not on word boundary

Groups

Use	To define
(exp)=	Indexed group
(?<name*>exp)*	Named group
(?<name1-name2>exp)	Balancing group
(?:exp)=	Non-capturing group
(?=exp)=	Zero-width positive look-ahead
(?!exp)=	Zero-width negative look-ahead
(?<=exp)=	Zero-width positive look-behind
(?<!exp)=	Zero-width negative look-behind
(?>exp)=	Non-backtracking (greedy)

Inline options

Option	Effect on match
i	Case-insensitive
m	Multiline mode
n	Explicit (named)
s	Single-line mode
x	Ignore white space

Inline options .NET special instruction

Use	To
(?imnsx-imnsx)	Set or disable the specified options
*(?imnsx-imnsx:exp)*	Set or disable the specified options within the expression

Back References

Use	To match
\n	Indexed group
\k<name>	Named group

Alternation

Use	To match
a \|b	Either a or b
(?(exp) yes \| no)	yes if exp is matched no if exp isn't matched
(?(name) yes \| no)	yes if name is matched no if name isn't matched

Substitution

Use	To substitute
$n	Substring matched by group number n
${name}	Substring matched by group name
$$	Literal $ character
$&	Copy of whole match
$`	Text before the match
$'	Text after the match
$+	Last captured group
$_	Entire input string

Comments

Use	To
(?# comment)	Add inline comment
#	Add x-mode comment

Supported Unicode Categories

Category	Description
Lu	Letter, uppercase
LI	Letter, lowercase
Lt	Letter, title case
Lm	Letter, modifier
Lo	Letter, other
L	Letter, all
Mn	Mark, non-spacing combining
Mc	Mark, spacing combining
Me	Mark, enclosing combining
M	Mark, all diacritic
Nd	Number, decimal digit
Nl	Number, letter-like
No	Number, other
N	Number, all
Pc	Punctuation, connector
Pd	Punctuation, dash
Ps	Punctuation, opening mark
Pe =	Punctuation, closing mark
Pi	Punctuation, initial quote mark
Pf	Punctuation, final quote mark
Po	Punctuation, other
P	Punctuation, all
Sm	Symbol, math
Sc	Symbol, currency
Sk	Symbol, modifier
So	Symbol, other
S	Symbol, all
Zs	Separator, space
Zl	Separator, line
Zp	Separator, paragraph
Z	Separator, all
Cc	Control code
Cf	Format control character
Cs	Surrogate code point
Co	Private-use character
Cn	Unassigned
C	Control characters, all

For named character set blocks (e.= g., Cyrillic), search for "supported named blocks" in the MSDN Library.

Metadata Consulting [dot] ca - Blog

Pages

Friday, April 3, 2026

Unicode Regex Expressions Cheat Sheet 2026

Why `\X` is different

Or test using 💻 Launch Netspresso Lite (scroll to bottom, highlights substitutions!)

No comments:

Post a Comment

Search and do some good

Pages

Friday, April 3, 2026

Unicode Regex Expressions Cheat Sheet 2026

Why \X is different

Or test using 💻 Launch Netspresso Lite (scroll to bottom, highlights substitutions!)

No comments:

Post a Comment

Search and do some good

Why `\X` is different