The re
module in Python provides support for regular expressions
, which are a powerful tool for searching, matching, and manipulating strings based on patterns
.
Regular expressions are used for tasks like validating input, searching for patterns, replacing substrings,
and more.
Key Functions in the re
Module
match - re.match(pattern, string, flags=0)
: Attempts to match a pattern at the beginning of the string
.
search -re.search(pattern, string, flags=0)
: Searches the string for a match to the pattern
anywhere in the string.
findall - re.findall(pattern, string, flags=0)
: Returns a list of all non-overlapping matches
of the pattern in the string.
finditer - re.finditer(pattern, string, flags=0)
: Returns an iterator yielding match objects for all non-overlapping matches
of the pattern in the string.
sub - re.sub(pattern, repl, string, count=0, flags=0)
: Replaces matches of the pattern in the string with repl
.
split - re.split(pattern, string, maxsplit=0, flags=0)
: Splits the string
by occurrences of the pattern.
compile - re.compile(pattern, flags=0)
: Compiles a regular expression pattern into a regex object, which can be used for matching.
How to use them?
re.match()
import re pattern = r'\d+' # Matches one or more digits string = '123abc' match = re.match(pattern, string) if match: print(f"Matched: {match.group()}") # Output: Matched: 123 else: print("No match")
re.search()
import re pattern = r'\d+' # Matches one or more digits string = 'abc123def' search = re.search(pattern, string) if search: print(f"Found: {search.group()}") # Output: Found: 123 else: print("Not found")
re.findall()
import re pattern = r'\d+' # Matches one or more digits string = 'abc123def456ghi789' all_matches = re.findall(pattern, string) print(all_matches) # Output: ['123', '456', '789']
re.finditer()
import re pattern = r'\d+' # Matches one or more digits string = 'abc123def456ghi789' iterator = re.finditer(pattern, string) for match in iterator: print(f"Found {match.group()} at {match.start()}-{match.end()}") # Output: # Found 123 at 3-6 # Found 456 at 9-12 # Found 789 at 15-18
re.sub()
import re pattern = r'\d+' # Matches one or more digits string = 'abc123def456ghi789' replacement = '#' result = re.sub(pattern, replacement, string) print(result) # Output: abc#def#ghi#
re.split()
import re pattern = r'\d+' # Matches one or more digits string = 'abc123def456ghi789' result = re.split(pattern, string) print(result) # Output: ['abc', 'def', 'ghi', '']
re.compile()
import re pattern = re.compile(r'\d+') # Using compiled pattern string = 'abc123def456ghi789' search = pattern.search(string) if search: print(f"Found: {search.group()}") # Output: Found: 123 all_matches = pattern.findall(string) print(all_matches) # Output: ['123', '456', '789']
Regular Expression Syntax
- Literals: Ordinary characters match themselves. Example:
a
matches 'a'. - Metacharacters: Characters with special meanings.
:
Matches any character except newline.^
: Matches the start of the string.$
: Matches the end of the string.*
: Matches 0 or more repetitions of the preceding RE.+
: Matches 1 or more repetitions of the preceding RE.?
: Matches 0 or 1 repetition of the preceding RE.{m,n}
: Matches fromm
ton
repetitions of the preceding RE.[...]
: Matches any single character in brackets.|
: A|B matches either A or B.()
: Matches the RE inside the parentheses and indicates a group.
Flags
re.IGNORECASE
(re.I
): Ignore case.re.MULTILINE
(re.M
): Treat the string as multiple lines.re.DOTALL
(re.S
): Make.
match any character, including newlines.re.UNICODE
(re.U
): Make\w
,\W
,\b
,\B
,\d
,\D
,\s
,\S
dependent on Unicode character properties.re.VERBOSE
(re.X
): Allow spaces and comments in the pattern for readability.
Overview
The re
module is highly versatile and powerful for text processing tasks, making it an essential tool in many programming scenarios.