Find it

Regular expressions are a very handy and flexible way of finding string patterns.

In a regex, characters are searched as they are entered except for the reserved characters:

. * + ? { | ( ) [ \ ^ $

The string foo only matches the string foo. The . matches any single character so that .oo matches poo and foo. Pretty simple.

The * means that the preceding character is repeated 0 or more times. The + means that the character is matched 1 or more times. The ? means that the match is optional (i.e., 0 or 1 times). So fo+s? matches fo, foo and foos.

The pipe (|) indicates an alternative pattern. So, fo(o|e) matches foo and foe. The parens are used for grouping patterns together.

In addition to the simple character matching, regex has different classes of characters denoted by brackets like [0-9] or [A-Z]. The - indicates that it is a range between two Unicode characters unless it is the first or last character in the range.

The ^ in a range indicates that it is not in a range. ^[0-9] matches anything that is not a digit.

In addition to the brackets, there are some defined classes:

\d digits
\s white space
\w word characters
\D non-digits
\S non-white space
\W non-word characters

The ^ means it occurs at the beginning of the string. The $ means that the match is at the end of a string. So, ^\d+$ matches a string that is all digits.

If you need to match one of the reserved characters in a string, preface it with a backslash.

In JavaScript, you can create a regular expression literal with slashes:

const decimalRegex = /[0-9]+\.[0-9]*/

This is a RegExp object in JavaScript which can also be created like:

const decimalRegex = new RegExp('[0-9]+\\.[0-9]*')

The matching behavior can be modified by flags. The flags are as follows:

Letter	Property	Description
i	ignoreCase	case-insensitive match
m	multiline	`^`, `$` match start, end of line
s	dotAll	`.` matches newline
u	unicode	match Unicode characters
g	global	find all matches
y	sticky	match must start `lastIndex`

The flags can be combined. So that /^[A-Z]/im matches upper or lower case letters at the beginning of a line.

The Unicode flag is useful if you are matching Unicode characters that are encoded with two code units. I might write some more on that later because it can be an interesting discussion.

That covers most of the basic syntax. You can get really complicated with RegExp. It’s often handy to write and test them using an online test like this one. I’ll write more on how the RegExp object in JavaScript works in an upcoming post.

javascript