Did you know? The difference in terms of security between RSA-2048 and RSA-4096 is only about 28 bits and both are not more secure than a 140bit symmetric cipher. Read more×

Regular Expression Cheat Sheet

Every time I had to go to the lovely Rubular just to take a look at their Regex Quick Reference, I thought of composing a list of my own, a bit more detailed though. There are a dozen Regex hacks that I keep forgetting that are really useful and I have to look them up every time.

Well, hopefully with this list that won't have to happen again. And if you think of something that is missing, please leave a comment and I will add it.

Revision 4 (Each revision extends the article with new hacks. I'll bump the number whenever I add something new)

Skip and go to Hacks
TL;DR {

Basic Regex Reference

Regex Description Regex Description Regex Description
[abc] A single character of: a, b, or c . Any single character (...) Capture everything enclosed
^ Start of line \s Any whitespace character (a|b) a or b
[^abc] Any single character except: a, b, or c \S Any non-whitespace character a? Zero or one of a
$ End of line \d Any digit a* Zero or more of a
[a-z] Any single character in the range a-z \D Any non-digit a+ One or more of a
\A Start of string \w Any word character (letter, number, underscore) a{3} Exactly 3 of a
[a-zA-Z] Any single character in the range a-z or A-Z \W Any non-word character a{3,} 3 or more of a
\z End of string \b Any word boundary a{3,6} Between 3 and 6 of a

Groups & Modifiers (Options)

Syntax Meaning
(foo) Capture everything enclosed
(?:foo) Group, but do not capture
(?=foo) Lookahead
(?<=foo) Lookbehind
(?!foo) Negative Lookahead
(?<!foo) Negative Lookbehind

}

Hacks

Match everything until a sequence

Looking at the groups reference above, you can do that with a lookahead expression, like so:

/.+?(?=foo)/

This will match everything until the string foo.

To capture everything until foo, you can use the following:

/(.+?)(?=foo)/

Make sure you include the ? in the first capture group or otherwise that will result in an empty capture group at the end.

Capture everything until an optional sequence

Because Regex engines are greedy, what I showed above wont work with just adding an optional modifier to it (?). Instead you should do this:

(.+?)(?:foo|$)

This will capture both abc and abcfoo.

Greediness vs. Laziness

By default Regex is greedy. It would try to match as much as possible when you use operators like + or *. But sometimes you might only need to match more than one but as few as possible. This could be done by switching from Greedy mode to Lazy mode and is achieved by just adding a ? to the multiple operator like: +? or *?. This feature of Regex is called Possessive Quantifiers.

Here is a detailed example matching the string abbbbbc:

Now a bit more interesting

See how that reversed the order of the matches. Because greediness and laziness change the order in which permutations are tried, they can change the overall match. However, they do not change the fact that the regex engine will backtrack to try all possible permutations of the regular expression in case no match can be found. Possessive quantifiers are a way to prevent the regex engine from trying all permutations. This is useful for performance reasons. You can also use possessive quantifiers to eliminate certain matches.

P.S. This article could have been called: Harry Potter and the Power of Regex


Created
Last updated
DownloadPlain text