What Is Regex? Regular Expressions Explained

Regex (regular expressions) is a pattern-matching language for finding, extracting, and replacing text. Learn the core syntax, common patterns, and how regex works in JavaScript, Python, and other languages.

Regular Expressions Explained Simply

A regular expression is a sequence of characters that defines a search pattern. You use it to test whether a string matches a pattern, find all occurrences of a pattern in text, or replace matched text with something else.

  • Test: Does this string match the pattern? โ€” email validation, phone number format checks
  • Find: Where are all occurrences in this text? โ€” extract all URLs, dates, or numbers from a document
  • Replace: Swap matched text โ€” reformat dates, strip HTML tags, normalize whitespace
  • Split: Break a string on a pattern โ€” split on any whitespace, not just a single character

Core Regex Syntax

  • Literal characters: cat matches the exact string "cat" anywhere in the input
  • . (dot): Matches any single character except a newline โ€” c.t matches "cat", "cut", "c3t"
  • * + ? (quantifiers): * = zero or more, + = one or more, ? = zero or one โ€” colou?r matches both "color" and "colour"
  • [abc] (character class): Matches any one of the listed characters โ€” [aeiou] matches any vowel
  • [^abc] (negated class): Matches any character NOT in the list
  • ^ and $ (anchors): ^ matches the start of the string; $ matches the end โ€” ^\d+$ matches strings of only digits
  • (group) and | (alternation): (cat|dog) matches "cat" or "dog"; groups capture the matched text for extraction

Shorthand Character Classes

  • \d โ€” Any digit (0โ€“9). \D = any non-digit
  • \w โ€” Any word character (letters, digits, underscore). \W = any non-word character
  • \s โ€” Any whitespace (space, tab, newline). \S = any non-whitespace
  • \b โ€” Word boundary โ€” \bcat\b matches "cat" but not "catch" or "concatenate"
  • {n,m} (quantifier range): \d{2,4} matches 2 to 4 consecutive digits
  • Practical patterns: Email: [\w.+-]+@[\w-]+\.[a-z]{2,} โ€” Phone: \+?[\d\s\-()]{7,15} โ€” URL: https?://[^\s]+

Frequently Asked Questions

Is regex the same across all programming languages?

Core syntax is similar but there are important differences. JavaScript, Python, Ruby, and PHP use PCRE-style regex with lookaheads, lookbehinds, and backreferences. Go uses RE2 which excludes lookaheads to guarantee linear-time matching. Java uses a PCRE variant with some differences in flag handling. POSIX regex (used in older Unix tools like grep without -P) has different syntax for character classes and no shorthand like \d. Always test regex in the language you're deploying to โ€” the regex tester lets you switch between JavaScript and Python modes.

What does "greedy" vs "lazy" matching mean?

Quantifiers (*, +, {n,m}) are greedy by default โ€” they match as much text as possible. <.+> applied to <b>hello</b> matches the entire string <b>hello</b> (one long match), not just <b>. Adding ? makes them lazy โ€” match as little as possible. <.+?> matches <b> and </b> separately. Lazy matching is essential when extracting delimited content.

When should I not use regex?

Regex is powerful but the wrong tool for some tasks. Don't use regex to parse HTML or XML โ€” use a proper parser (BeautifulSoup, lxml, DOMParser). Don't use regex to parse JSON โ€” use JSON.parse(). Don't write a regex to validate email addresses for business logic โ€” a simple check for @ and a domain plus server-side sending confirmation is more reliable than a complex regex that still misses edge cases. Regex excels at pattern extraction and transformation; structured data parsing is better handled by dedicated parsers.