What Is Regex? Regular Expressions Explained

Q: What does "greedy" vs "lazy" matching mean?

Quantifiers ( * , + , {n,m} ) are greedy by default — they match as much text as possible. <.+> applied to hello matches the entire string hello (one long match), not just . Adding ? makes them lazy — match as little as possible. <.+?> matches and separately. Lazy matching is essential when extracting delimited content.

Regex (regular expressions) is a pattern-matching language for finding, extracting, and replacing text. Learn the core syntax, common patterns, and how regex works in JavaScript, Python, and other languages.

Regular Expressions Explained Simply

A regular expression is a sequence of characters that defines a search pattern. You use it to test whether a string matches a pattern, find all occurrences of a pattern in text, or replace matched text with something else.

Test: Does this string match the pattern? — email validation, phone number format checks
Find: Where are all occurrences in this text? — extract all URLs, dates, or numbers from a document
Replace: Swap matched text — reformat dates, strip HTML tags, normalize whitespace
Split: Break a string on a pattern — split on any whitespace, not just a single character

Test a Regex Pattern →

Core Regex Syntax

Literal characters: cat matches the exact string "cat" anywhere in the input
. (dot): Matches any single character except a newline — c.t matches "cat", "cut", "c3t"
* + ? (quantifiers): * = zero or more, + = one or more, ? = zero or one — colou?r matches both "color" and "colour"
[abc] (character class): Matches any one of the listed characters — [aeiou] matches any vowel
[^abc] (negated class): Matches any character NOT in the list
^ and $ (anchors): ^ matches the start of the string; $ matches the end — ^\d+$ matches strings of only digits
(group) and | (alternation): (cat|dog) matches "cat" or "dog"; groups capture the matched text for extraction

Shorthand Character Classes

\d — Any digit (0–9). \D = any non-digit
\w — Any word character (letters, digits, underscore). \W = any non-word character
\s — Any whitespace (space, tab, newline). \S = any non-whitespace
\b — Word boundary — \bcat\b matches "cat" but not "catch" or "concatenate"
{n,m} (quantifier range): \d{2,4} matches 2 to 4 consecutive digits
Practical patterns: Email: [\w.+-]+@[\w-]+\.[a-z]{2,} — Phone: \+?[\d\s\-()]{7,15} — URL: https?://[^\s]+

Frequently Asked Questions

Is regex the same across all programming languages?

Core syntax is similar but there are important differences. JavaScript, Python, Ruby, and PHP use PCRE-style regex with lookaheads, lookbehinds, and backreferences. Go uses RE2 which excludes lookaheads to guarantee linear-time matching. Java uses a PCRE variant with some differences in flag handling. POSIX regex (used in older Unix tools like grep without -P) has different syntax for character classes and no shorthand like \d. Always test regex in the language you're deploying to — the regex tester lets you switch between JavaScript and Python modes.

What does "greedy" vs "lazy" matching mean?

Quantifiers (*, +, {n,m}) are greedy by default — they match as much text as possible. <.+> applied to hello matches the entire string hello (one long match), not just . Adding ? makes them lazy — match as little as possible. <.+?> matches  and  separately. Lazy matching is essential when extracting delimited content.

When should I not use regex?

Regex is powerful but the wrong tool for some tasks. Don't use regex to parse HTML or XML — use a proper parser (BeautifulSoup, lxml, DOMParser). Don't use regex to parse JSON — use JSON.parse(). Don't write a regex to validate email addresses for business logic — a simple check for @ and a domain plus server-side sending confirmation is more reliable than a complex regex that still misses edge cases. Regex excels at pattern extraction and transformation; structured data parsing is better handled by dedicated parsers.