Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Damerau-Levenshtein NFA

Fuzzy matching via finite automata for longer patterns.

Overview

For patterns longer than 64 characters or when complex regex features are needed, fuzzy-regex uses a Damerau-Levenshtein NFA. This is an extension of the classic Damerau-Levenshtein automaton.

Damerau-Levenshtein Automaton

The Damerau-Levenshtein automaton for a pattern P and max distance k accepts all strings within k edits of P.

State Representation

#![allow(unused)]
fn main() {
// Each state represents a position in the pattern
// plus the number of errors used so far
struct NfaState {
    pattern_position: usize,  // Position in pattern
    errors_used: usize,       // Errors consumed
}
}

Transitions

For each input character, the NFA can transition to:

  1. Match: Next character in pattern (0 errors)
  2. Insertion: Stay at current position (1 error)
  3. Deletion: Skip pattern character (1 error)
  4. Substitution: Move to next with 1 error
  5. Transposition: Swap next two characters (special case)

Building the NFA

#![allow(unused)]
fn main() {
// Pattern: "abc" with k=1
// States: (0,0) → (1,0) → (2,0) → (3,0)  [exact path]
//         (0,0) → (1,1) → ...             [with errors]
}

Construction Algorithm

  1. Create initial state at position 0
  2. For each pattern character, create states for:
    • Exact match (no error)
    • Insertion (adds error)
    • Deletion (adds error)
    • Substitution (adds error)
  3. Connect states with transitions
  4. Mark accepting states

Performance Characteristics

AspectDamerau-Levenshtein NFA
Time ComplexityO(n × k × m)
Space ComplexityO(k × m)
Pattern LengthUnlimited
FeaturesFull regex

Where:

  • n = text length
  • k = max edits
  • m = pattern length

Comparison with Bitap

FeatureBitapDamerau-Levenshtein NFA
Max Pattern Length64Unlimited
Time ComplexityO(n×k)O(n×k×m)
SIMD SupportYesLimited
Regex FeaturesLimitedFull
Memory UsageLowHigher

When Used

  • Pattern length > 64 characters
  • Complex regex features (lookahead, backreferences)
  • Alternation with fuzzy segments