A regular expression is a pattern. Some parts of the pattern match single characters in the string of a particular type. Other parts of the pattern match multiple characters, or multiples of multiples.
Single - character Patterns:
- a single character matches itself
- a dot "." matches any single character except a newline
- a chracter class is inclosed in [ ]. One and only one of these characters must be present at the corresponding part of the string to match
e.g. [aeiuoAEIOU] matches any upper- or lowercase vowel in a string
e.g. [0123456789] matches any single digit - a negated character class matches any single character that is not in the list
e.g. [^0-9] matches any single non-digit
Grouping Patterns:
- the first pattern is the sequence. That means that abc matches an a followed by a b followed by a c.
- The asterisk indicates "zero or more" of the immediately previous character or character class
- Similarly the "+" sign means "one or more"
- and the "?" means "zero or one" of the immediately previous character or character class
- another group construction is the alternation, as in "a|b|c". This means to match excactly one of the alternatives (a or b or c in this case). This works even if the alternatives have multiple characters, as in song|blue, which matches either song or blue!
Anchoring Patterns:
- a \b requires a word boundary at the indicated point in order for the pattern to match
- \B requires that there not be a word boundary at the indicated point
e.g. \bFred\B matches "Frederick" but not "Fred Flintstone" - a ^ matches the beginning of the string if it's the first character in the expression to match
- a $ matches the end of the string if it's the last character in the expression to match