Python Regular Expression – Character Classes

Spread the love

Character Classes –

In regular expression, character classes or sets is a set of characters or range of characters inside a square brackets.

Positive Character Class –

In regex, character classes or sets matches only a single character. If we want to match any vowels, we can write [‘aeiou’]. This means regex will match either a or e or i or o or u. There is a OR relations between the characters inside the square bracket.

Let’s look at an example.

In [1]: import re

In [2]: re.findall('[abcdef]', 'Python is awesome')
Out[2]: ['a', 'e', 'e']

The above pattern will match either a or b or c or d or e or f.

Python also provide a range operator ‘-‘ to make things easier. Let’s say that you want to match all the letters from a to z then instead of writing [‘abcdefghijklmnopqrstuvwxyz’], you can write [a-z]. To match all capital case letters we can write [A-Z] and to match all digits we can write [0-9]. We can also combine multiple ranges like this [a-zA-Z0-9]. This will match any lower case characters, upper case characters and digits from 0 to 9.

In [3]: re.findall('[a-f]', 'Python is awesome')
Out[3]: ['a', 'e', 'e']

Negative character classes –

The negative character classes will match any characters except the characters inside the square brackets. It is the opposite of positive classes. We denote Negative character classes using a caret symbol inside the square bracket. Outside the square bracket, caret symbol has different meaning in regex. It means start of a string.

In [4]: re.findall('[^a-f]', 'Python is awesome')
Out[4]: ['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'w', 's', 'o', 'm']

Rating: 1 out of 5.

Leave a Reply