Appendix G. MySQL Regular Expressions

A regular expression (regex) is a powerful way of specifying a complex search.

MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version.

This is a simplistic reference that skips the details. To get more exact information, see Henry Spencer's regex(7) manual page that is included in the source distribution. Appendix C, Credits .

A regular expression describes a set of strings. The simplest regexp is one that has no special characters in it. For example, the regexp hello matches hello and nothing else.

Non-trivial regular expressions use certain special constructs so that they can match more than one string. For example, the regexp hello|word matches either the string hello or the string word.

As a more complex example, the regexp B[an]*s matches any of the strings Bananas, Baaaaas, Bs, and any other string starting with a B, ending with an s, and containing any number of a or n characters in between.

A regular expression may use any of the following special characters/constructs:

^

Match the beginning of a string.

mysql SELECT "fo\nfo" REGEXP "^fo$";           - 0
mysql SELECT "fofo" REGEXP "^fo";              - 1
$

Match the end of a string.

mysql SELECT "fo\no" REGEXP "^fo\no$";         - 1
mysql SELECT "fo\no" REGEXP "^fo$";            - 0
.

Match any character (including newline).

mysql SELECT "fofo" REGEXP "^f.*";             - 1
mysql SELECT "fo\nfo" REGEXP "^f.*";           - 1
a*

Match any sequence of zero or more a characters.

mysql SELECT "Ban" REGEXP "^Ba*n";             - 1
mysql SELECT "Baaan" REGEXP "^Ba*n";           - 1
mysql SELECT "Bn" REGEXP "^Ba*n";              - 1
a+

Match any sequence of one or more a characters.

mysql SELECT "Ban" REGEXP "^Ba+n";             - 1
mysql SELECT "Bn" REGEXP "^Ba+n";              - 0
a?

Match either zero or one a character.

mysql SELECT "Bn" REGEXP "^Ba?n";              - 1
mysql SELECT "Ban" REGEXP "^Ba?n";             - 1
mysql SELECT "Baan" REGEXP "^Ba?n";            - 0
de|abc

Match either of the sequences de or abc.

mysql SELECT "pi" REGEXP "pi|apa";             - 1
mysql SELECT "axe" REGEXP "pi|apa";            - 0
mysql SELECT "apa" REGEXP "pi|apa";            - 1
mysql SELECT "apa" REGEXP "^(pi|apa)$";        - 1
mysql SELECT "pi" REGEXP "^(pi|apa)$";         - 1
mysql SELECT "pix" REGEXP "^(pi|apa)$";        - 0
(abc)*

Match zero or more instances of the sequence abc.

mysql SELECT "pi" REGEXP "^(pi)*$";            - 1
mysql SELECT "pip" REGEXP "^(pi)*$";           - 0
mysql SELECT "pipi" REGEXP "^(pi)*$";          - 1
{1}, {2,3}

The is a more general way of writing regexps that match many occurrences of the previous atom.

a*

Can be written as a{0,}.

a+

Can be written as a{1,}.

a?

Can be written as a{0,1}.

To be more precise, an atom followed by a bound containing one integer i and no comma matches a sequence of exactly i matches of the atom. An atom followed by a bound containing one integer i and a comma matches a sequence of i or more matches of the atom. An atom followed by a bound containing two integers i and j matches a sequence of i through j (inclusive) matches of the atom.

Both arguments must be in the range from 0 to RE_DUP_MAX (default 255), inclusive. If there are two arguments, the second must be greater than or equal to the first.

[a-dX], [^a-dX]

Matches any character which is (or is not, if ^ is used) either a, b, c, d or X. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. So [0-9] matches any decimal digit. Any character that does not have a defined meaning inside a [] pair has no special meaning and matches only itself.

mysql SELECT "aXbc" REGEXP "[a-dXYZ]";         - 1
mysql SELECT "aXbc" REGEXP "^[a-dXYZ]$";       - 0
mysql SELECT "aXbc" REGEXP "^[a-dXYZ]+$";      - 1
mysql SELECT "aXbc" REGEXP "^[^a-dXYZ]+$";     - 0
mysql SELECT "gheis" REGEXP "^[^a-dXYZ]+$";    - 1
mysql SELECT "gheisa" REGEXP "^[^a-dXYZ]+$";   - 0
[[.characters.]]

The sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression containing a multi-character collating element can thus match more than one character, for example, if the collating sequence includes a ch collating element, then the regular expression [[.ch.]]*c matches the first five characters of chchcc.

[=character_class=]

An equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself.

For example, if o and (+) are the members of an equivalence class, then [[=o=]], [[=(+)=]], and [o(+)] are all synonymous. An equivalence class may not be an endpoint of a range.

[:character_class:]

Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are:

NameNameName
alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit

These stand for the character classes defined in the ctype(3) manual page. A locale may provide others. A character class may not be used as an endpoint of a range.

mysql SELECT "justalnums" REGEXP "[[:alnum:]]+";       - 1
mysql SELECT "!!" REGEXP "[[:alnum:]]+";               - 0
[[::]], [[::]]

These match the null string at the beginning and end of a word respectively. A word is defined as a sequence of word characters which is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype(3)) or an underscore (_).

mysql SELECT "a word a" REGEXP "[[::]]word[[::]]";      - 1
mysql SELECT "a xword a" REGEXP "[[::]]word[[::]]";     - 0
mysql SELECT "weeknights" REGEXP "^(wee|week)(knights|nights)$"; - 1
freelance web developer India web development india website designer | Software developer India