Skip to content

Files

Latest commit

 

History

History
111 lines (71 loc) · 3.46 KB

401read19.md

File metadata and controls

111 lines (71 loc) · 3.46 KB

Python Regex!

What is it?

Regular Expressions, often shortened as regex, are a sequence of characters used to check whether a pattern exists in a given text (string) or not.

How do I import it?

import re

What can I do with Regex?

You can make regex identify patterns to look for, search/find certain characters, split and compile strings.

  • re.match() returns a match object if the text matches the pattern.
  • re.search() scans through the given string/sequence, looking for the first location where the regular expression produces a match.
  • re.group() returns the string matched by the re.

R, Carets, and Dollar signs

When using regex to search for certain characters it's in good practice to start the search string with a 'r'

When trying to find if the thing you're searching for is at the beginning use a '^' carat, and when looking for the ending use the '$' dollar sign.

Example:

re.search(r'Co.k.e', 'Cookie').group()
> returns: 'Cookie'

re.search(r'^Eat', "Eat cake!").group()
> returns: 'Eat'

re.search(r'cake$', "Cake! Let's eat cake").group()
> returns: 'cake'

Character range searches

When searching for a range of characters use this:

[abc] - Matches a or b or c. And [a-zA-Z0-9] - Matches any letter from (a to z) or (A to Z) or (0 to 9).

The importance of backslash:

  • If the character following the backslash is a recognized escape character, then the special meaning of the term is taken (Scenario 1)
  • Else if the character following the \ is not a recognized escape character, then the \ is treated like any other character and passed through (Scenario 2).
  • \ can be used in front of all the metacharacters to remove their special meaning (Scenario 3).

Example:

re.search(r'Not a\sregular character', 'Not a regular character').group()
>returns: 'Not a regular character'

More rules:

  • \w - Lowercase 'w'. Matches any single letter, digit, or underscore.

  • \W - Uppercase 'W'. Matches any character not part of \w (lowercase w).

  • \s - Lowercase 's'. Matches a single whitespace character like: space, newline, tab, return.

  • \S - Uppercase 'S'. Matches any character not part of \s (lowercase s).

  • \d - Lowercase d. Matches decimal digit 0-9.

  • \D - Uppercase d. Matches any character that is not a decimal digit.

Repeates

    • Checks if the preceding character appears one or more times starting from that position.
    • Checks if the preceding character appears zero or more times starting from that position. ? - Checks if the preceding character appears exactly zero or one time starting from that position.

{x} - Repeat exactly x number of times. {x,} - Repeat at least x times or more. {x, y} - Repeat at least x times but no more than y times.

Examples:

re.search(r'\d{9,10}', '0987654321').group()

returns: '0987654321'

Grouping

The group feature of regular expression allows you to pick up parts of the matching text.

Email validation Example:

statement = 'Please contact us at: support@datacamp.com'

match = re.search(r'([\w\.-]+)@([\w\.-]+)', statement)
if statement:
  print("Email address:", match.group()) # The whole matched text
  print("Username:", match.group(1)) # The username (group 1)
  print("Host:", match.group(2)) # The host (group 2)

Email address: support@datacamp.com
Username: support
Host: datacamp.com

What if i forget anything i've learned with regex?

Reference this link