Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Algorithm Selection

How fuzzy-regex chooses the matching algorithm.

Decision Tree

Pattern Analysis
      │
      ▼
┌──────────────────┐
│ Is exact match?  │
└────────┬─────────┘
          │
      ┌────┴────┐
      ▼         ▼
     Yes       No
       │         │
       ▼         ▼
    DFA    ┌────────────────┐
           │ Pure greedy .*? │
           └────────┬────────┘
                    │
               ┌────┴────┐
               ▼         ▼
              Yes       No
               │         │
               ▼         ▼
         Instant      ┌────────────────┐
          Match      │ Pattern ≤64?   │
                     └────────┬────────┘
                              │
                         ┌────┴────┐
                         ▼         ▼
                        Yes       No
                         │         │
                         ▼         ▼
                       Bitap    Damerau-Levenshtein
                                  NFA

Algorithm Comparison

AlgorithmUse CaseComplexityFeatures
DFAExact patterns, capturing groupsO(n)Limited
Instant MatchPure greedy .*O(1)Limited
BitapShort fuzzy (≤64 chars)O(n×k)Most
Damerau-Levenshtein NFALong fuzzy patternsO(n×k×m)Full

Automatic Selection

The library automatically selects based on:

  1. Pattern length: Bitap for ≤64 chars
  2. Fuzzy complexity: Cost-based vs simple
  3. Regex features: DFA can’t do lookahead
  4. Streaming: Different code path
  5. Greedy patterns: Instant match for .*, ^.*$, .*$
  6. Greedy suffix patterns: .*SUFFIX uses reverse search

Special Optimizations

Pure Greedy Dot-Star

Patterns like .*, ^.*$, .*$ always match. The engine detects these and returns instantly without scanning:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*").unwrap();
// Returns match immediately - no text scanning
}

Greedy Prefix with Suffix

Patterns like .*test or .*test~2 use reverse search to avoid O(n²) behavior:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*test").unwrap();
// Finds "test" from the right, then .* matches everything before it
// O(n) instead of O(n²)
}

DFA with Capturing Groups

DFA now works with capturing groups like (?m)^(.*)test$:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new("(?m)^(.*)test$").unwrap();
// Uses DFA - much faster than NFA
}

All-Matches Algorithms

When finding all matches in text (find_iter, find_all), fuzzy-regex supports multiple algorithms optimized for different patterns.

The Problem

Some patterns produce many overlapping matches, causing naive “find, advance, repeat” to be O(n²):

#![allow(unused)]
fn main() {
// Pattern: .*a|b on text: "bbbbbbbb"
// Each 'b' matches individually → O(n) matches × O(n) scan = O(n²)
}

Available Algorithms

AlgorithmComplexityBest For
find_allO(n²) worst caseSimple patterns, few matches
find_all_two_passO(n²) worst caseWhen prefilter is effective
find_all_hardenedO(n) guaranteedPathological patterns

Two-Pass Algorithm

Uses a reverse prefilter to find candidate positions, then verifies matches:

  1. Pass 1: Scan right-to-left using prefilter (memchr, Teddy, etc.)
  2. Pass 2: For each candidate, find the match
#![allow(unused)]
fn main() {
let mut dfa = Dfa::new(".*a|b").unwrap();
let matches = dfa.find_all_two_pass("bbbbbb");
}

Hardened Mode (O(n))

Tracks all active DFA states simultaneously as we scan left-to-right:

#![allow(unused)]
fn main() {
let mut dfa = Dfa::new(".*a|b").unwrap();
let matches = dfa.find_all_hardened("bbbbbb");
}

How it works:

  1. Start with one state at current position
  2. Process all characters, tracking active states
  3. When no state can continue, emit the leftmost match
  4. Move to next position, repeat

Benchmark: Pathological Pattern

Pattern .*a|b on text of all ’b’s (worst case):

Text Sizefind_alltwo_passhardenedSpeedup
1,000 bytes1.08s1.10s69ms15x
5,000 bytes5.47s5.43s69ms79x
10,000 bytes10.76s10.85s69ms156x

When to Use Each

#![allow(unused)]
fn main() {
// Default: smart selection based on pattern
let matches = dfa.find_all(text);

// Explicit two-pass: good when prefilter is effective
let matches = dfa.find_all_two_pass(text);

// Explicit hardened: critical for pathological patterns
let matches = dfa.find_all_hardened(text);
}

Prefilter Types

The two-pass algorithm uses prefilters optimized for different patterns:

PrefilterUse CaseMethod
SingleByteSingle literalmemrchr
TwoBytesTwo literalsmemrchr for both
ThreeBytesThree literalsmemrchr for all
TeddyMany alternativesSIMD range matching

Manual Override

Not currently exposed, but internal selection can be inspected:

#![allow(unused)]
fn main() {
// Check which engine was used
let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
if re.supports_streaming() {
    // Using Bitap
}
}