What Is Regex? Regular Expressions Explained
Regex (regular expressions) is a pattern-matching language for finding, extracting, and replacing text. Learn the core syntax, common patterns, and how regex works in JavaScript, Python, and other languages.
Regular Expressions Explained Simply
A regular expression is a sequence of characters that defines a search pattern. You use it to test whether a string matches a pattern, find all occurrences of a pattern in text, or replace matched text with something else.
- Test: Does this string match the pattern? โ email validation, phone number format checks
- Find: Where are all occurrences in this text? โ extract all URLs, dates, or numbers from a document
- Replace: Swap matched text โ reformat dates, strip HTML tags, normalize whitespace
- Split: Break a string on a pattern โ split on any whitespace, not just a single character
Core Regex Syntax
- Literal characters:
catmatches the exact string "cat" anywhere in the input .(dot): Matches any single character except a newline โc.tmatches "cat", "cut", "c3t"*+?(quantifiers):*= zero or more,+= one or more,?= zero or one โcolou?rmatches both "color" and "colour"[abc](character class): Matches any one of the listed characters โ[aeiou]matches any vowel[^abc](negated class): Matches any character NOT in the list^and$(anchors):^matches the start of the string;$matches the end โ^\d+$matches strings of only digits(group)and|(alternation):(cat|dog)matches "cat" or "dog"; groups capture the matched text for extraction
Shorthand Character Classes
\dโ Any digit (0โ9).\D= any non-digit\wโ Any word character (letters, digits, underscore).\W= any non-word character\sโ Any whitespace (space, tab, newline).\S= any non-whitespace\bโ Word boundary โ\bcat\bmatches "cat" but not "catch" or "concatenate"{n,m}(quantifier range):\d{2,4}matches 2 to 4 consecutive digits- Practical patterns: Email:
[\w.+-]+@[\w-]+\.[a-z]{2,}โ Phone:\+?[\d\s\-()]{7,15}โ URL:https?://[^\s]+
Frequently Asked Questions
Is regex the same across all programming languages?
Core syntax is similar but there are important differences. JavaScript, Python, Ruby, and PHP use PCRE-style regex with lookaheads, lookbehinds, and backreferences. Go uses RE2 which excludes lookaheads to guarantee linear-time matching. Java uses a PCRE variant with some differences in flag handling. POSIX regex (used in older Unix tools like grep without -P) has different syntax for character classes and no shorthand like \d. Always test regex in the language you're deploying to โ the regex tester lets you switch between JavaScript and Python modes.
What does "greedy" vs "lazy" matching mean?
Quantifiers (*, +, {n,m}) are greedy by default โ they match as much text as possible. <.+> applied to <b>hello</b> matches the entire string <b>hello</b> (one long match), not just <b>. Adding ? makes them lazy โ match as little as possible. <.+?> matches <b> and </b> separately. Lazy matching is essential when extracting delimited content.
When should I not use regex?
Regex is powerful but the wrong tool for some tasks. Don't use regex to parse HTML or XML โ use a proper parser (BeautifulSoup, lxml, DOMParser). Don't use regex to parse JSON โ use JSON.parse(). Don't write a regex to validate email addresses for business logic โ a simple check for @ and a domain plus server-side sending confirmation is more reliable than a complex regex that still misses edge cases. Regex excels at pattern extraction and transformation; structured data parsing is better handled by dedicated parsers.