Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

fuzzy-regex is a high-performance fuzzy regular expression engine written in Rust. It combines traditional regex constructs with approximate string matching using Damerau-Levenshtein automata and the Bitap algorithm.

Key Features

  • Fuzzy Matching: Match strings with configurable edit distance tolerance (insertions, deletions, substitutions, transpositions)
  • Full Regex Support: Character classes, quantifiers, groups, alternation, anchors, lookahead/lookbehind
  • Per-Segment Fuzziness: Control fuzziness for individual parts of a pattern
  • Capture Groups: Named and numbered capture groups with fuzzy matching
  • Similarity Scoring: Get match quality scores (0.0 - 1.0)
  • Streaming API: Process large files and network streams incrementally
  • High Performance: Bitap algorithm for patterns ≤64 chars, SIMD optimizations
  • Unicode Support: Full Unicode support including case-insensitive matching

Why Fuzzy Regex?

Traditional regex requires exact matches. Fuzzy regex allows for “close enough” matches:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" with up to 1 edit
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    println!("{}", re.is_match("hello")); // Exact match
    println!("{}", re.is_match("hallo")); // 1 substitution
    println!("{}", re.is_match("helo")); // 1 deletion  
    println!("{}", re.is_match("helllo")); // 1 insertion
    println!("{}", re.is_match("hlelo")); // 1 transposition
}

Performance

fuzzy-regex is designed for high performance:

  • Bitap Algorithm: O(n×k) time complexity for patterns ≤64 characters
  • SIMD Optimizations: ~2-10x speedup on supported platforms (AVX2, NEON)
  • Hardened Mode: Guaranteed O(n) worst-case for pathological patterns
  • Streaming: Process gigabytes of data without loading into memory

Typical throughput: 1.4-2.0 Gbps for streaming fuzzy matching on modern hardware.

Pathological Pattern Protection

Some regex patterns can cause O(n²) behavior in naive implementations. fuzzy-regex includes hardened mode that guarantees O(n) performance even for patterns like .*a|b:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*a|b").unwrap();
// Guaranteed O(n) even on pathological patterns
let matches = re.find_all_hardened(text);
}

Ecosystem

  • MSRV: Rust 1.88+
  • Features: SIMD (default), mimalloc (optional)
  • Dependencies: Zero runtime dependencies