Re module in python


The re module in Python provides support for regular expressions, which are a powerful tool for searching, matching, and manipulating strings based on patterns.

Regular expressions are used for tasks like validating input, searching for patterns, replacing substrings, and more.


Key Functions in the re Module

match - re.match(pattern, string, flags=0)Attempts to match a pattern at the beginning of the string.

search -re.search(pattern, string, flags=0)Searches the string for a match to the pattern anywhere in the string.

findall - re.findall(pattern, string, flags=0)Returns a list of all non-overlapping matches of the pattern in the string.

finditer - re.finditer(pattern, string, flags=0)Returns an iterator yielding match objects for all non-overlapping matches of the pattern in the string.

sub - re.sub(pattern, repl, string, count=0, flags=0)Replaces matches of the pattern in the string with repl.

split - re.split(pattern, string, maxsplit=0, flags=0) : Splits the string by occurrences of the pattern.

compile - re.compile(pattern, flags=0)Compiles a regular expression pattern into a regex object, which can be used for matching.

How to use them?

re.match()

import re

pattern = r'\d+'  # Matches one or more digits
string = '123abc'

match = re.match(pattern, string)
if match:
    print(f"Matched: {match.group()}")  # Output: Matched: 123
else:
    print("No match")


re.search()

import re

pattern = r'\d+'  # Matches one or more digits
string = 'abc123def'

search = re.search(pattern, string)
if search:
    print(f"Found: {search.group()}")  # Output: Found: 123
else:
    print("Not found")


re.findall()

import re

pattern = r'\d+'  # Matches one or more digits
string = 'abc123def456ghi789'

all_matches = re.findall(pattern, string)
print(all_matches)  # Output: ['123', '456', '789']


re.finditer()

import re

pattern = r'\d+'  # Matches one or more digits
string = 'abc123def456ghi789'

iterator = re.finditer(pattern, string)
for match in iterator:
    print(f"Found {match.group()} at {match.start()}-{match.end()}")  
# Output:
# Found 123 at 3-6
# Found 456 at 9-12
# Found 789 at 15-18


re.sub()

import re

pattern = r'\d+'  # Matches one or more digits
string = 'abc123def456ghi789'
replacement = '#'

result = re.sub(pattern, replacement, string)
print(result)  # Output: abc#def#ghi#


re.split()

import re

pattern = r'\d+'  # Matches one or more digits
string = 'abc123def456ghi789'

result = re.split(pattern, string)
print(result)  # Output: ['abc', 'def', 'ghi', '']


re.compile()

import re

pattern = re.compile(r'\d+')

# Using compiled pattern
string = 'abc123def456ghi789'
search = pattern.search(string)
if search:
    print(f"Found: {search.group()}")  # Output: Found: 123

all_matches = pattern.findall(string)
print(all_matches)  # Output: ['123', '456', '789']


Regular Expression Syntax

  • Literals: Ordinary characters match themselves. Example: a matches 'a'.
  • Metacharacters: Characters with special meanings.
  • : Matches any character except newline.
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
  • *: Matches 0 or more repetitions of the preceding RE.
  • +: Matches 1 or more repetitions of the preceding RE.
  • ?: Matches 0 or 1 repetition of the preceding RE.
  • {m,n}: Matches from m to n repetitions of the preceding RE.
  • [...]: Matches any single character in brackets.
  • |: A|B matches either A or B.
  • (): Matches the RE inside the parentheses and indicates a group.


Flags

  • re.IGNORECASE (re.I): Ignore case.
  • re.MULTILINE (re.M): Treat the string as multiple lines.
  • re.DOTALL (re.S): Make . match any character, including newlines.
  • re.UNICODE (re.U): Make \w, \W, \b, \B, \d, \D, \s, \S dependent on Unicode character properties.
  • re.VERBOSE (re.X): Allow spaces and comments in the pattern for readability.


Overview

The re module is highly versatile and powerful for text processing tasks, making it an essential tool in many programming scenarios.