Regular Expressions, often shortened as regex, are a sequence of characters used to check whether a pattern exists in a given text (string) or not.
import re
You can make regex identify patterns to look for, search/find certain characters, split and compile strings.
- re.match() returns a match object if the text matches the pattern.
- re.search() scans through the given string/sequence, looking for the first location where the regular expression produces a match.
- re.group() returns the string matched by the re.
When using regex to search for certain characters it's in good practice to start the search string with a 'r'
When trying to find if the thing you're searching for is at the beginning use a '^' carat, and when looking for the ending use the '$' dollar sign.
Example:
re.search(r'Co.k.e', 'Cookie').group()
> returns: 'Cookie'
re.search(r'^Eat', "Eat cake!").group()
> returns: 'Eat'
re.search(r'cake$', "Cake! Let's eat cake").group()
> returns: 'cake'
When searching for a range of characters use this:
[abc] - Matches a or b or c. And [a-zA-Z0-9] - Matches any letter from (a to z) or (A to Z) or (0 to 9).
- If the character following the backslash is a recognized escape character, then the special meaning of the term is taken (Scenario 1)
- Else if the character following the \ is not a recognized escape character, then the \ is treated like any other character and passed through (Scenario 2).
- \ can be used in front of all the metacharacters to remove their special meaning (Scenario 3).
Example:
re.search(r'Not a\sregular character', 'Not a regular character').group()
>returns: 'Not a regular character'
-
\w - Lowercase 'w'. Matches any single letter, digit, or underscore.
-
\W - Uppercase 'W'. Matches any character not part of \w (lowercase w).
-
\s - Lowercase 's'. Matches a single whitespace character like: space, newline, tab, return.
-
\S - Uppercase 'S'. Matches any character not part of \s (lowercase s).
-
\d - Lowercase d. Matches decimal digit 0-9.
-
\D - Uppercase d. Matches any character that is not a decimal digit.
-
- Checks if the preceding character appears one or more times starting from that position.
-
- Checks if the preceding character appears zero or more times starting from that position. ? - Checks if the preceding character appears exactly zero or one time starting from that position.
{x} - Repeat exactly x number of times. {x,} - Repeat at least x times or more. {x, y} - Repeat at least x times but no more than y times.
Examples:
re.search(r'\d{9,10}', '0987654321').group()
returns: '0987654321'
Email validation Example:
statement = 'Please contact us at: support@datacamp.com'
match = re.search(r'([\w\.-]+)@([\w\.-]+)', statement)
if statement:
print("Email address:", match.group()) # The whole matched text
print("Username:", match.group(1)) # The username (group 1)
print("Host:", match.group(2)) # The host (group 2)
Email address: support@datacamp.com
Username: support
Host: datacamp.com
Reference this link