
Character Classes –
In regular expression, character classes or sets is a set of characters or range of characters inside a square brackets.
Positive Character Class –
In regex, character classes or sets matches only a single character. If we want to match any vowels, we can write [‘aeiou’]. This means regex will match either a or e or i or o or u. There is a OR relations between the characters inside the square bracket.
Let’s look at an example.
In [1]: import re
In [2]: re.findall('[abcdef]', 'Python is awesome')
Out[2]: ['a', 'e', 'e']
The above pattern will match either a or b or c or d or e or f.
Python also provide a range operator ‘-‘ to make things easier. Let’s say that you want to match all the letters from a to z then instead of writing [‘abcdefghijklmnopqrstuvwxyz’], you can write [a-z]. To match all capital case letters we can write [A-Z] and to match all digits we can write [0-9]. We can also combine multiple ranges like this [a-zA-Z0-9]. This will match any lower case characters, upper case characters and digits from 0 to 9.
In [3]: re.findall('[a-f]', 'Python is awesome')
Out[3]: ['a', 'e', 'e']
Negative character classes –
The negative character classes will match any characters except the characters inside the square brackets. It is the opposite of positive classes. We denote Negative character classes using a caret symbol inside the square bracket. Outside the square bracket, caret symbol has different meaning in regex. It means start of a string.
In [4]: re.findall('[^a-f]', 'Python is awesome')
Out[4]: ['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'w', 's', 'o', 'm']