Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

fuzzy-regex is a high-performance fuzzy regular expression engine written in Rust. It combines traditional regex constructs with approximate string matching using Damerau-Levenshtein automata and the Bitap algorithm.

Key Features

  • Fuzzy Matching: Match strings with configurable edit distance tolerance (insertions, deletions, substitutions, transpositions)
  • Full Regex Support: Character classes, quantifiers, groups, alternation, anchors, lookahead/lookbehind
  • Per-Segment Fuzziness: Control fuzziness for individual parts of a pattern
  • Capture Groups: Named and numbered capture groups with fuzzy matching
  • Similarity Scoring: Get match quality scores (0.0 - 1.0)
  • Streaming API: Process large files and network streams incrementally
  • High Performance: Bitap algorithm for patterns ≤64 chars, SIMD optimizations
  • Unicode Support: Full Unicode support including case-insensitive matching

Why Fuzzy Regex?

Traditional regex requires exact matches. Fuzzy regex allows for “close enough” matches:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" with up to 1 edit
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    println!("{}", re.is_match("hello")); // Exact match
    println!("{}", re.is_match("hallo")); // 1 substitution
    println!("{}", re.is_match("helo")); // 1 deletion  
    println!("{}", re.is_match("helllo")); // 1 insertion
    println!("{}", re.is_match("hlelo")); // 1 transposition
}

Performance

fuzzy-regex is designed for high performance:

  • Bitap Algorithm: O(n×k) time complexity for patterns ≤64 characters
  • SIMD Optimizations: ~2-10x speedup on supported platforms (AVX2, NEON)
  • Hardened Mode: Guaranteed O(n) worst-case for pathological patterns
  • Streaming: Process gigabytes of data without loading into memory

Typical throughput: 1.4-2.0 Gbps for streaming fuzzy matching on modern hardware.

Pathological Pattern Protection

Some regex patterns can cause O(n²) behavior in naive implementations. fuzzy-regex includes hardened mode that guarantees O(n) performance even for patterns like .*a|b:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*a|b").unwrap();
// Guaranteed O(n) even on pathological patterns
let matches = re.find_all_hardened(text);
}

Ecosystem

  • MSRV: Rust 1.88+
  • Features: SIMD (default), mimalloc (optional)
  • Dependencies: Zero runtime dependencies

Installation

Add fuzzy-regex to your Cargo.toml:

[dependencies]
fuzzy-regex = "0.1"

Feature Flags

  • simd (default): Enable SIMD optimizations for faster matching on x86_64 and ARM
  • mimalloc: Use mimalloc allocator for better performance in memory-intensive workloads
[dependencies]
fuzzy-regex = { version = "0.1", default-features = false }
# Or with mimalloc
fuzzy-regex = { version = "0.1", features = ["simd", "mimalloc"] }

MSRV

The minimum supported Rust version is 1.88.

Quick Verification

use fuzzy_regex::FuzzyRegex;

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    println!("{}", re.is_match("hallo")); // true
}

Quick Start

Basic Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow up to 2 edits
    let re = FuzzyRegex::new("(?:hello){e<=2}").unwrap();

    assert!(re.is_match("hello"));   // Exact match
    assert!(re.is_match("helo"));    // 1 deletion
    assert!(re.is_match("helllo"));  // 1 insertion
    assert!(re.is_match("hallo"));   // 1 substitution
    assert!(re.is_match("hlelo"));   // 1 transposition
}

Using the Builder

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:teh){e<=1}")
        .similarity(0.7)         // Minimum similarity score
        .case_insensitive(true)  // Case-insensitive matching
        .build()
        .unwrap();

    let m = re.find("I saw teh cat").unwrap();
    assert_eq!(m.as_str(), "teh");
    assert!(m.similarity() >= 0.7);
}

Finding Matches

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?:\w+){e<=1}").unwrap();

    // Find first match
    let m = re.find("This is a tset").unwrap();
    assert_eq!(m.as_str(), "tset");

    // Find all matches
    let matches: Vec<_> = re.find_iter("test tset tsat").collect();
    assert_eq!(matches.len(), 3);
}

Streaming Large Data

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    // Process in chunks
    for m in stream.feed(b"some hay and niddle here") {
        println!("Found: {} at {}", m.as_str(), m.start());
    }

    // Check position
    assert_eq!(stream.position(), 24);
}

Fuzzy Matching Basics

Fuzzy matching allows matching strings that are “close” to the pattern, not just exact matches. The closeness is measured using edit distance.

Edit Distance

Edit distance is the minimum number of character-level operations needed to transform one string into another:

  • Insertion: Add a character (cost: 1)
  • Deletion: Remove a character (cost: 1)
  • Substitution: Replace one character with another (cost: 1)
  • Transposition: Swap two adjacent characters (cost: 1)

Example: “hello” → “hallo” (1 substitution)

Fuzziness Markers

Apply fuzziness to a pattern segment using {...}:

SyntaxDescription
(?:text){e<=N}Allow up to N total edits
(?:text){i<=N}Allow up to N insertions
(?:text){d<=N}Allow up to N deletions
(?:text){s<=N}Allow up to N substitutions
(?:text){t<=N}Allow up to N transpositions
fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow up to 2 total edits
    let re1 = FuzzyRegex::new("(?:hello){e<=2}").unwrap();

    // Allow specific edit types
    let re2 = FuzzyRegex::new("(?:hello){i<=1,d<=1}").unwrap();

    // Allow substitutions only
    let re3 = FuzzyRegex::new("(?:hello){s<=2}").unwrap();
    
    println!("re1: {:?}", re1.is_match("hello"));
    println!("re2: {:?}", re2.is_match("helo"));
    println!("re3: {:?}", re3.is_match("hallo"));
}

Shorthand Syntax

Use ~N as shorthand for {e<=N}:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // These are equivalent:
    let re1 = FuzzyRegex::new("(?:hello)~2").unwrap();
    let re2 = FuzzyRegex::new("(?:hello){e<=2}").unwrap();

    // Exact match with ~0
    let re3 = FuzzyRegex::new("(?:hello)~0").unwrap();
    
    println!("re1 == re2: {}", re1.is_match("hello") == re2.is_match("hello"));
    println!("re3 exact: {}", re3.is_match("hello"));
}

Unlimited Errors

Omit the number to allow unlimited edits:

fn main() {
    // Allow unlimited substitutions
    let re = FuzzyRegex::new("(?:hello){s}").unwrap();

    // Allow any number of errors
    let re2 = FuzzyRegex::new("(?:hello){e}").unwrap();
    
    println!("{}", re.is_match("hallo"));
    println!("{}", re2.is_match("xyz"));
}

Edit Distance

The edit distance (Damerau-Levenshtein distance) measures how many operations it takes to transform the pattern into the matched text.

Edit Types

Insertion

Adding a character to the text:

Pattern: "cat"
Text:    "cart" 
Edit:    Insert 'r' after 'c'
Cost:    1

Deletion

Removing a character from the text:

Pattern: "cat"  
Text:    "ct"
Edit:    Delete 'a'
Cost:    1

Substitution

Replacing one character with another:

Pattern: "cat"
Text:    "bat"
Edit:    Replace 'c' with 'b'
Cost:    1

Transposition

Swapping two adjacent characters:

Pattern: "cat"
Text:    "act"
Edit:    Swap 'c' and 'a'
Cost:    1

Accessing Edit Information

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=2}").unwrap();
    let m = re.find("hallo").unwrap();

    println!("Similarity: {:.2}", m.similarity());
    println!("Edits: {}", m.len()); // Match length

    // Edit counts available in the engine
    // Use engine::EditCounts for detailed info
}

Similarity Score

The similarity score ranges from 0.0 to 1.0, where 1.0 is an exact match:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=2}")
        .similarity(0.8) // Minimum similarity threshold
        .build()
        .unwrap();

    // Exact match: similarity = 1.0
    // 1 edit: similarity around 0.8-0.9
    // 2 edits: similarity around 0.6-0.8
    println!("{}", re.is_match("hello"));
    println!("{}", re.is_match("hallo"));
}

Fuzziness Markers

Detailed syntax for controlling fuzzy matching.

Basic Markers

MarkerDescriptionExample
{e<=N}Total edits ≤ N{e<=2}
{i<=N}Insertions ≤ N{i<=1}
{d<=N}Deletions ≤ N{d<=1}
{s<=N}Substitutions ≤ N{s<=1}
{t<=N}Transpositions ≤ N{t<=1}

Combining Markers

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow 1 insertion AND 1 deletion
    let re1 = FuzzyRegex::new("(?:hello){i<=1,d<=1}").unwrap();

    // Allow up to 2 substitutions OR up to 1 deletion (combined constraint)
    let re2 = FuzzyRegex::new("(?:hello){s<=2,d<=1}").unwrap();

    // Each constraint is independent
    // The match must satisfy ALL specified constraints
    println!("re1 matches 'helo': {}", re1.is_match("helo"));
    println!("re2 matches 'hallo': {}", re2.is_match("hallo"));
}

Character Class Restrictions

Restrict which characters can be used for edits:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Only allow substitutions with lowercase letters
    let re1 = FuzzyRegex::new(r"(?:hello){s<=1:[a-z]}").unwrap();
    assert!(re1.is_match("hallo"));  // 'a' is in [a-z]
    assert!(!re1.is_match("h3llo")); // '3' is not in [a-z]

    // Only allow insertions of digits
    let re2 = FuzzyRegex::new(r"(?:hello){i<=1:[0-9]}").unwrap();

    // Only allow substitutions with whitespace
    let re3 = FuzzyRegex::new(r"(?:hello){s<=1:\s}").unwrap();
}

Min/Max Error Ranges

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Require at least 1 edit, allow up to 2
    let re = FuzzyRegex::new("(?:hello){e>=1,e<=2}").unwrap();

    // Minimum errors with shorthand
    let re2 = FuzzyRegex::new("(?:hello){1e<=2}").unwrap();
    
    println!("re matches 'hello': {}", re.is_match("hello"));
    println!("re matches 'hallo': {}", re.is_match("hallo"));
}

Editing Classes

Apply fuzziness to specific character classes:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Apply different limits to different parts
    let re = FuzzyRegex::new("(?:hello){e<=1} world{e<=1}").unwrap();
    println!("{}", re.is_match("helo worled"));
}

Cost-Based Matching

Assign different costs to edit operations for fine-grained control.

Simple Cost Constraint

Use {c<=N} to limit total cost (all operations cost 1):

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Total cost ≤ 2 (any combination of edits, each costing 1)
    let re = FuzzyRegex::new("(?:hello){c<=2}").unwrap();

    assert!(re.is_match("hello")); // 0 edits, cost=0
    assert!(re.is_match("hallo")); // 1 sub, cost=1
    assert!(re.is_match("helo"));  // 1 del, cost=1
    assert!(re.is_match("hhello")); // 1 ins, cost=1
    assert!(re.is_match("hallo")); // 1 sub, cost=1
}

Weighted Costs

Assign different costs to different edit types:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Insertions cost 2, others cost 1, total ≤ 3
    let re = FuzzyRegex::new("(?:ab){2i+1d+1s+1t<=3}").unwrap();

    assert!(re.is_match("abc")); // 1 insertion (cost=2)
    assert!(re.is_match("a"));   // 1 deletion (cost=1)
    assert!(re.is_match("ba"));  // 1 transposition (cost=1)
    assert!(!re.is_match("aabc")); // 2 insertions (cost=4) - exceeds limit
}

Cost Syntax

SyntaxDescription
{c<=N}Total cost ≤ N
{c<N}Total cost < N
{Ni+Md+St+Tt<=N}Custom costs for each type
{0i+...}Free insertions
{Ni...}Insertions cost N
fn main() {
    use fuzzy_regex::FuzzyRegex;

    // All operations cost 1: {1i+1d+1s+1t<=N} = {c<=N}
    let re1 = FuzzyRegex::new("(?:test){c<=2}").unwrap();

    // Insertions are expensive (typing errors less common)
    let re2 = FuzzyRegex::new("(?:test){3i+1d+1s<=3}").unwrap();

    // Insertions are free (OCR errors)
    let re3 = FuzzyRegex::new("(?:test){0i+1d+1s<=2}").unwrap();
    
    println!("re1: {}", re1.is_match("test"));
    println!("re2: {}", re2.is_match("test"));
    println!("re3: {}", re3.is_match("tset"));
}

Use Cases

  • Typing errors: Substitutions/deletions more common than insertions
  • OCR errors: Insertions more common (extra characters)
  • Genetic sequences: Different scoring matrices

Pattern Syntax

fuzzy-regex supports standard regex syntax plus fuzzy matching.

Quick Reference

SyntaxDescription
aLiteral character
.Any character except newline
\dDigit [0-9]
\DNon-digit
\wWord character [a-zA-Z0-9_]
\WNon-word character
\sWhitespace
\SNon-whitespace
[abc]Character class
[^abc]Negated class
[a-z]Range
^Start of string
$End of string
\bWord boundary
\BNon-word boundary

Escaping

Special characters that need escaping: .^$*+?{}[]\\|()

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match literal dot
    let re = FuzzyRegex::new(r"file\.txt").unwrap();

    // Match literal backslash
    let re2 = FuzzyRegex::new(r"path\\to\\file").unwrap();
    
    println!("dot match: {}", re.is_match("file.txt"));
    println!("backslash match: {}", re2.is_match(r"path\to\file"));
}

Character Classes

Define sets of characters to match.

Basic Classes

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match any vowel
    let re1 = FuzzyRegex::new("[aeiou]").unwrap();

    // Match any consonant
    let re2 = FuzzyRegex::new("[^aeiou]").unwrap();

    // Match any lowercase letter
    let re3 = FuzzyRegex::new("[a-z]").unwrap();

    // Match any alphanumeric
    let re4 = FuzzyRegex::new("[a-zA-Z0-9]").unwrap();
    
    println!("vowel: {}", re1.is_match("a"));
    println!("consonant: {}", re2.is_match("b"));
}

Predefined Classes

By default, \w, \d, \s match ASCII characters only:

EscapeASCII OnlyUnicode (with (?u)
\d[0-9]Unicode digits
\D[^0-9]Non-digit
\w[a-zA-Z0-9_]Unicode word chars
\W[^a-zA-Z0-9_]Non-word
\s[ \t\n\r\f]Unicode whitespace
\S[^ \t\n\r\f]Non-whitespace

Unicode Mode

Enable Unicode character classes with (?u) flag:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // Via inline flag
    let re1 = FuzzyRegex::new(r"(?u)\w+").unwrap();
    assert!(re1.is_match("привет")); // Cyrillic matched

    // Via builder
    let re2 = FuzzyRegexBuilder::new(r"\w+")
        .unicode(true)
        .build()
        .unwrap();
}

Unicode Property Escapes

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Unicode letters
    let re1 = FuzzyRegex::new(r"\p{L}").unwrap();

    // Unicode digits
    let re2 = FuzzyRegex::new(r"\p{N}").unwrap();

    // Emoji
    let re3 = FuzzyRegex::new(r"\p{Emoji}").unwrap();
    
    println!("letter: {}", re1.is_match("a"));
    println!("digit: {}", re2.is_match("١")); // Arabic-Indic digit
}

Nested Classes

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // POSIX-like classes (inside character class)
    let re = FuzzyRegex::new("[[:alpha:]]").unwrap();
    println!("alpha: {}", re.is_match("a"));
}

Quantifiers

Control how many times to match a pattern.

Basic Quantifiers

SyntaxDescriptionExample
*Zero or moreab* matches “a”, “ab”, “abb”
+One or moreab+ matches “ab”, “abb”
?Zero or oneab? matches “a”, “ab”
{n}Exactly na{3} matches “aaa”
{n,}At least na{2,} matches “aa”, “aaa”
{n,m}Between n and ma{2,4} matches “aa”, “aaa”, “aaaa”

Lazy Quantifiers

Add ? to make quantifiers lazy (match as few as possible):

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Greedy: matches as much as possible
    let re1 = FuzzyRegex::new(r"<.+>").unwrap();

    // Lazy: matches as little as possible  
    let re2 = FuzzyRegex::new(r"<.+?>").unwrap();

    let text = "<tag>more</tag>";
    assert!(re1.find(text).unwrap().as_str() == "<tag>more</tag>");
    assert!(re2.find(text).unwrap().as_str() == "<tag>");
}

Possessive Quantifiers

Prevent backtracking (useful for performance):

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // a*+ doesn't backtrack
    let re = FuzzyRegex::new("a*+b").unwrap();
    println!("{}", re.is_match("ab"));
}

Quantifiers with Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "ab" with up to 1 error, repeated 1-3 times
    let re = FuzzyRegex::new("(?:ab){e<=1}{1,3}").unwrap();
    println!("{}", re.is_match("ab"));
}

Groups and Alternation

Group patterns and match alternatives.

Capture Groups

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?<user>\w+)@(?<domain>\w+\.\w+)").unwrap();
    let caps = re.captures("john@example.com").unwrap();

    assert_eq!(caps.name("user").unwrap().as_str(), "john");
    assert_eq!(caps.name("domain").unwrap().as_str(), "example.com");
}

Numbered Groups

fn main() {
    let re = FuzzyRegex::new(r"(\w+)@(\w+)").unwrap();
    let caps = re.captures("test@example").unwrap();

    assert_eq!(caps.get(1).unwrap().as_str(), "test");
    assert_eq!(caps.get(2).unwrap().as_str(), "example");
}

Non-Capturing Groups

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // (?:...) doesn't create a capture group
    let re = FuzzyRegex::new("(?:http|https)://").unwrap();
    println!("{}", re.is_match("http://example.com"));
}

Alternation

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match either option
    let re = FuzzyRegex::new("(foo|bar)").unwrap();

    assert!(re.is_match("foo"));
    assert!(re.is_match("bar"));
    assert!(!re.is_match("baz"));
}

Nested Groups

fn main() {
    let re = FuzzyRegex::new(r"((ab)(cd))").unwrap();
    let caps = re.captures("abcd").unwrap();

    assert_eq!(caps.get(0).unwrap().as_str(), "abcd"); // Full match
    assert_eq!(caps.get(1).unwrap().as_str(), "abcd"); // Outer group
    assert_eq!(caps.get(2).unwrap().as_str(), "ab");   // First inner
    assert_eq!(caps.get(3).unwrap().as_str(), "cd");   // Second inner
}

Fuzzy Groups

Apply fuzziness to groups:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("((?:hello){e<=1})").unwrap();
    println!("{}", re.is_match("helo"));
}

Anchors and Boundaries

Match positions in the string.

String Anchors

SyntaxDescription
^Start of string
$End of string
fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match at start
    let re1 = FuzzyRegex::new("^hello").unwrap();
    assert!(re1.is_match("hello world"));
    assert!(!re1.is_match("say hello"));

    // Match at end
    let re2 = FuzzyRegex::new("hello$").unwrap();
    assert!(re2.is_match("say hello"));
    assert!(!re2.is_match("hello world"));

    // Match entire string
    let re3 = FuzzyRegex::new("^hello$").unwrap();
    assert!(re3.is_match("hello"));
    assert!(!re3.is_match("hello world"));
}

Word Boundaries

SyntaxDescription
\bWord boundary
\BNon-word boundary
fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match whole word "cat"
    let re1 = FuzzyRegex::new(r"\bcat\b").unwrap();
    assert!(re1.is_match("cat"));
    assert!(re1.is_match("the cat sat"));
    assert!(!re1.is_match("category"));

    // Match "cat" not at word boundary
    let re2 = FuzzyRegex::new(r"\Bcat\B").unwrap();
    assert!(re2.is_match("category"));
    assert!(!re2.is_match("cat"));
}

Multi-line Mode

With (?m), ^ and $ match line boundaries:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?m)^hello$").unwrap();
    let text = "hello\nhello\nhello";
    let matches: Vec<_> = re.find_iter(text).collect();
    assert_eq!(matches.len(), 3);
}

Lookahead and Lookbehind

Assert what comes before or after the match without including it.

Lookahead

Positive Lookahead (?=...)

Match only if followed by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" only if followed by " world"
    let re = FuzzyRegex::new("hello(?= world)").unwrap();

    assert!(re.is_match("hello world"));  // Matches "hello"
    assert!(!re.is_match("hello there")); // No match
}

Negative Lookahead (?!...)

Match only if NOT followed by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" only if NOT followed by " world"
    let re = FuzzyRegex::new("hello(?! world)").unwrap();

    assert!(re.is_match("hello there")); // Matches "hello"
    assert!(!re.is_match("hello world")); // No match
}

Lookbehind

Positive Lookbehind (?<=...)

Match only if preceded by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "world" only if preceded by "hello "
    let re = FuzzyRegex::new("(?<=hello )world").unwrap();

    assert!(re.is_match("hello world"));
    assert!(!re.is_match("bye world"));

    // Get match position
    let m = re.find("say hello world here").unwrap();
    assert_eq!(m.start(), 9); // After "hello "
}

Negative Lookbehind (?<!...)

Match only if NOT preceded by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "world" only if NOT preceded by "hello "
    let re = FuzzyRegex::new("(?<!hello )world").unwrap();

    assert!(re.is_match("bye world"));
    assert!(!re.is_match("hello world"));
}

Fuzzy Lookbehind

Lookbehind can include fuzzy matching:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "world" preceded by "hello" with up to 1 error
    let re = FuzzyRegex::new("(?<=(?:hello){e<=1})world").unwrap();

    assert!(re.is_match("hello world")); // Exact
    assert!(re.is_match("hallo world")); // 1 substitution
    assert!(!re.is_match("goodbye world")); // Too different
}

Fuzzy Lookbehind Examples

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow insertions in lookbehind
    let re = FuzzyRegex::new("(?<=(?:hello){i<=2})world").unwrap();
    assert!(re.is_match("helllo world")); // 2 insertions (ll)
    assert!(re.is_match("hellllo world")); // 3 insertions - no match

    // Allow deletions in lookbehind
    let re = FuzzyRegex::new("(?<=(?:hello){d<=1})world").unwrap();
    assert!(re.is_match("helo world")); // 1 deletion (one 'l')
    assert!(!re.is_match("heo world")); // 2 deletions - no match

    // Combined: substitutions and insertions
    let re = FuzzyRegex::new("(?<=(?:hello){s<=1,i<=1})world").unwrap();
    assert!(re.is_match("hallo world")); // 1 substitution
    assert!(re.is_match("helloh world")); // 1 insertion
    assert!(re.is_match("hallo world")); // matches, prefers lower edits
}

Case Insensitive with Fuzzy

Combine case-insensitive matching with fuzzy for robust matching:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // Case-insensitive + fuzzy - case differences don't count as edits
    let re = FuzzyRegexBuilder::new("(?i)(?:hello){e<=1}")
        .build()
        .unwrap();

    assert!(re.is_match("hello")); // 0 edits
    assert!(re.is_match("HELLO")); // 0 edits (case difference is free!)
    assert!(re.is_match("hallo")); // 1 edit (actual substitution)

    // Builder's case_insensitive flag works the same way
    let re2 = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .case_insensitive(true)
        .build()
        .unwrap();

    assert!(re2.is_match("HELLO")); // 0 edits
    assert!(re2.is_match("HeLLo")); // 0 edits (all case differences)

    // Get edit count to verify
    let m = re.find("HELLO").unwrap();
    assert_eq!(m.total_edits(), 0); // No edits charged for case!
}

Fuzzy Lookahead

Fuzzy matching also works with lookahead:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" followed by "world" with up to 1 error
    let re = FuzzyRegex::new("hello(?= {e<=1}world)").unwrap();

    assert!(re.is_match("hello world")); // Exact
    assert!(!re.is_match("hello world!")); // ! is not space, lookahead fails
}

Variable-Length Lookbehind

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Variable-length lookbehind (alternation)
    let re = FuzzyRegex::new("(?<=(?:hello|hi) )world").unwrap();
    println!("{}", re.is_match("hi world"));
}

Atomic Groups and Possessive Quantifiers

Advanced features for performance and pattern control.

Atomic Groups (?>...)

Atomic groups prevent backtracking once matched:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Without atomic group: can match "abc" then backtrack
    let re1 = FuzzyRegex::new("(?:ab|abc)").unwrap();

    // Atomic: once "abc" matches, no backtracking
    let re2 = FuzzyRegex::new("(?>ab|abc)").unwrap();
    
    println!("re1: {}", re1.is_match("abc"));
    println!("re2: {}", re2.is_match("abc"));
}

Possessive Quantifiers

Like atomic groups for quantifiers:

SyntaxDescription
*+Possessive zero-or-more
++Possessive one-or-more
?+Possessive zero-or-one
fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Normal: backtracks to find match
    let re1 = FuzzyRegex::new("a*b").unwrap();

    // Possessive: doesn't backtrack
    let re2 = FuzzyRegex::new("a*+b").unwrap();
    
    println!("re1: {}", re1.is_match("ab"));
    println!("re2: {}", re2.is_match("ab"));
}

When to Use

Atomic groups and possessive quantifiers are useful when:

  1. Performance: Prevent exponential backtracking
  2. Determinism: Control match behavior explicitly
  3. Greedy matching: When you want maximum match without backtracking
fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Pattern that could cause backtracking issues:
    // Without possessive: "aaaaaaaa" + "a*" + "b" can be slow
    let re = FuzzyRegex::new("(?:a++)b").unwrap();
    println!("{}", re.is_match("ab"));
}

Match Reset \K

Reset the start of the match:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // \K resets match start - keeps "prefix" out of match
    let re = FuzzyRegex::new(r"prefix\Kworld").unwrap();
    let m = re.find("prefixworld").unwrap();
    assert_eq!(m.as_str(), "world");
    assert_eq!(m.start(), 6); // Starts at "world", not "prefix"
}

API Reference

Overview of the main API types and methods.

Main Types

TypeDescription
FuzzyRegexCompiled regex pattern
FuzzyRegexBuilderBuilder for customizing regex
MatchA single match result
CapturesMatch with capture groups
StreamingMatcherStateful matcher for streaming
StreamingMatchMatch from streaming search

Quick Method Reference

#![allow(unused)]
fn main() {
use fuzzy_regex::FuzzyRegex;

// Construction
let re = FuzzyRegex::new("pattern").unwrap();

// Searching
re.is_match(text)           // bool - does it match?
re.find(text)               // Option<Match>
re.find_iter(text)          // Iterator<Match>
re.find_rev(text)           // Option<Match> - rightmost
re.captures(text)           // Option<Captures>

// Replacing
re.replace(text, "x")       // String
re.replace_all(text, "x")   // String
re.split(text)              // Split

// Streaming
re.stream()                 // StreamingMatcher
re.find_bytes(bytes)        // Option<StreamingMatch>
}

FuzzyRegex

The main compiled regex type.

Construction

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Simple construction
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    // With error handling
    let re = match FuzzyRegex::new("(?:hello){e<=1}") {
        Ok(r) => r,
        Err(e) => panic!("Invalid pattern: {}", e),
    };
    
    println!("Created regex: {}", re.is_match("hello"));
}

Search Methods

is_match

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    println!("{}", re.is_match("hello")); // true
    println!("{}", re.is_match("hallo")); // true
    println!("{}", re.is_match("world")); // false
}

find

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    if let Some(m) = re.find("say hello world") {
        println!("Found '{}' at {}-{}", m.as_str(), m.start(), m.end());
        println!("Similarity: {:.2}", m.similarity());
    }
}

find_iter

fn main() {
    let re = FuzzyRegex::new(r"(?:\w+){e<=1}").unwrap();

    for m in re.find_iter("test tset tsat") {
        println!("Match: '{}'", m.as_str());
    }
}

find_rev

Find rightmost match:

fn main() {
    let re = FuzzyRegex::new("(?:foo)").unwrap();
    let m = re.find_rev("foo foo foo").unwrap();
    assert_eq!(m.start(), 8);
}

Replace Methods

fn main() {
    let re = FuzzyRegex::new("world").unwrap();

    // Replace first match
    let result = re.replace("hello world", "rust");
    assert_eq!(result, "hello rust");

    // Replace all matches
    let result = re.replace_all("hello world world", "rust");
    assert_eq!(result, "hello rust rust");
}

Other Methods

fn main() {
    let re = FuzzyRegex::new(r"(?<word>\w+)").unwrap();

    // Capture groups
    let caps = re.captures("hello").unwrap();
    let word = caps.name("word").unwrap();

    // Split
    let parts: Vec<_> = re.split("a1b2c").collect();

    // Check pattern info
    println!("Capture count: {}", re.capture_count());
}

FuzzyRegexBuilder

Builder for customizing regex construction.

Basic Usage

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello)")
        .build()
        .unwrap();
    
    println!("Created");
}

Builder Options

Similarity Threshold

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=2}")
        .similarity(0.8)  // Minimum similarity 0.0-1.0
        .build()
        .unwrap();
    
    println!("Created");
}

Case Insensitivity

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello)")
        .case_insensitive(true)
        .build()
        .unwrap();

    assert!(re.is_match("HELLO"));
    assert!(re.is_match("Hello"));
}

Multi-line Mode

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("^hello$")
        .multi_line(true)  // ^ and $ match line boundaries
        .build()
        .unwrap();
    
    println!("Created");
}

Dot-all Mode

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("a.b")
        .dot_all(true)  // . matches newlines
        .build()
        .unwrap();
    
    println!("Created");
}

Partial Matching

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)  // Matches at end of text are partial
        .build()
        .unwrap();

    let m = re.find("hello").unwrap();
    assert!(m.partial()); // Match reaches end of text
}

Timeout

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;
    use std::time::Duration;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=5}")
        .timeout(Duration::from_millis(100))
        .build()
        .unwrap();
    
    println!("Created");
}

Chaining Options

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .case_insensitive(true)
        .similarity(0.7)
        .partial(true)
        .multi_line(true)
        .build()
        .unwrap();
    
    println!("Created");
}

Match Results

Methods available on match results.

Basic Methods

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("say hello world").unwrap();

    // Get matched string
    println!("{}", m.as_str());           // &str: "hello"
    println!("{}", m.start());            // usize: 4 (byte position)
    println!("{}", m.end());              // usize: 9
    println!("{}", m.len());              // usize: 5
    println!("{}", m.is_empty());         // bool: false
}

Similarity

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("say hello world").unwrap();

    // Similarity score 0.0 - 1.0
    println!("{}", m.similarity());       // f32: 1.0 = exact, lower = more errors
}

Partial Match

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("say hello").unwrap();

    // Check if match reaches end of input (for partial matching)
    println!("{}", m.partial());         // bool: true if match at end
}

Match as Different Types

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("hello world").unwrap();

    // Get raw bytes
    let bytes = m.as_bytes();

    // Get range
    let range = m.range(); // Range<usize>
    println!("Range: {:?}", range);
}

Iterating Matches

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"\w+").unwrap();

    for m in re.find_iter("hello world") {
        println!("{}", m.as_str());
    }
}

Match Iteration with Positions

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:foo){e<=1}").unwrap();

    for m in re.find_iter("foo fxo fxoo") {
        println!("'{}' at [{}, {})", m.as_str(), m.start(), m.end());
    }
}

Streaming API

Process large data incrementally without loading into memory.

Creating a Stream

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();
    println!("Stream created");
}

Feeding Data

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    // Feed data in chunks
    for m in stream.feed(b"some hay and niddle here") {
        println!("Found '{}' at {}", m.as_str(), m.start());
    }
}

Position Tracking

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    stream.feed(b"hello");
    assert_eq!(stream.position(), 5);

    stream.feed(b" world");
    assert_eq!(stream.position(), 11);
}

Cross-Boundary Matches

Matches can span chunk boundaries:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    // First chunk - no match yet
    let _ = stream.feed(b"xxx ");

    // Second chunk - match spanning boundary
    let matches: Vec<_> = stream.feed(b"hello").collect();
    println!("Matches: {}", matches.len());
}

Resetting

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    stream.feed(b"hello world");
    stream.reset();

    assert_eq!(stream.position(), 0);
}

Finish

Check for remaining matches at end of stream:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();
    stream.feed(b"test hel");

    // Check remaining buffer
    let final_match = stream.finish();
}

Reader Integration

Process files or other readers:

fn main() {
    use fuzzy_regex::FuzzyRegex;
    use std::io::BufReader;
    use std::fs::File;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let file = File::open("Cargo.toml").unwrap();

    for m in re.stream().search_reader(BufReader::new(file)) {
        println!("Match at byte {}", m.start());
    }
}

Byte-Level API

Search byte slices directly:

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    if let Some(m) = re.find_bytes(b"hello world") {
        println!("Found at {}-{}", m.start(), m.end());
    }

    // Iterate over bytes
    let matches: Vec<_> = re.find_iter_bytes(b"hello hullo").collect();
    println!("Matches: {}", matches.len());
}

Advanced Features

Additional capabilities beyond basic fuzzy matching.

In This Section

This section covers:

  1. Capture Groups - Extract matched portions
  2. Partial Matching - Match incomplete input
  3. Word Lists - Named reference patterns
  4. Custom Handlers - Invoke custom logic during matching
  5. Compatibility Layer - Migrate from other libraries

Capture Groups

Extract specific portions of matches.

Named Capture Groups

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?<user>\w+)@(?<domain>\w+\.\w+)").unwrap();
    let caps = re.captures("john@example.com").unwrap();

    assert_eq!(caps.name("user").unwrap().as_str(), "john");
    assert_eq!(caps.name("domain").unwrap().as_str(), "example.com");
}

Numbered Capture Groups

fn main() {
    let re = FuzzyRegex::new(r"(\w+)@(\w+)").unwrap();
    let caps = re.captures("test@example").unwrap();

    assert_eq!(caps.get(1).unwrap().as_str(), "test");
    assert_eq!(caps.get(2).unwrap().as_str(), "example");
    assert_eq!(caps.get(0).unwrap().as_str(), "test@example"); // Full match
}

Nested Groups

fn main() {
    let re = FuzzyRegex::new(r"((?P<outer>\w+)-(?P<inner>\w+))").unwrap();
    let caps = re.captures("hello-world").unwrap();

    assert_eq!(caps.get(0).unwrap().as_str(), "hello-world");
    assert_eq!(caps.get(1).unwrap().as_str(), "hello-world");
    assert_eq!(caps.name("outer").unwrap().as_str(), "hello");
    assert_eq!(caps.name("inner").unwrap().as_str(), "world");
}

Capture Groups with Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?<word>\w+){e<=1}").unwrap();
    let caps = re.captures("tset").unwrap();

    assert_eq!(caps.name("word").unwrap().as_str(), "tset");
}

Fuzzy Edits Information

When using fuzzy matching, you can get detailed edit information:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?:hello){e<=2}").unwrap();
    let m = re.find("helllo").unwrap();

    // Get total edits
    println!("Total edits: {}", m.total_edits()); // 1 (1 insertion)

    // Get detailed counts: (insertions, deletions, substitutions)
    let (ins, del, sub) = m.fuzzy_counts();
    println!("Insertions: {}, Deletions: {}, Substitutions: {}", ins, del, sub);

    // Get positions of changes
    let (ins_pos, del_pos, sub_pos) = m.fuzzy_changes();
    println!("Insertion positions: {:?}", ins_pos);
}

Handler Overrides in Captures

When using handlers with MatchOverride, the overrides are tracked in captures:

fn main() {
    use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

    let re = FuzzyRegexBuilder::new(r"(prefix(?call:handler)suffix)")
        .handler("handler", |text, pos| {
            if text[pos..].starts_with("XYZ") {
                HandlerResult::MatchOverride(3, "xyz".to_string())
            } else {
                HandlerResult::NoMatch
            }
        })
        .build()
        .unwrap();

    let caps = re.captures("prefixXYZsuffix").unwrap();

    // The captured text shows the override
    assert_eq!(caps.get(1).unwrap().as_str(), "prefixxyzsuffix");

    // Handler overrides track (start_byte, end_byte, override_text)
    let overrides = caps.handler_overrides();
    println!("Handler overrides: {:?}", overrides);
    // Output: [(6, 9, "xyz")]
}

Multiple Capture Groups with Fuzzy

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Two fuzzy groups with different limits
    let re = FuzzyRegex::new(r"(?<a>\w+){e<=1} (?<b>\w+){e<=1}").unwrap();
    let caps = re.captures("helo wrold").unwrap();

    assert_eq!(caps.name("a").unwrap().as_str(), "helo"); // 1 substitution
    assert_eq!(caps.name("b").unwrap().as_str(), "wrold"); // 1 substitution

    // Each group tracks its own edits
    // Note: Fuzzy matching applies to the overall pattern
}

Iterating Captures

fn main() {
    let re = FuzzyRegex::new(r"(\w+)").unwrap();
    let caps = re.captures("hello world").unwrap();

    for cap in caps.iter() {
        if let Some(m) = cap {
            println!("Group: '{}'", m.as_str());
        }
    }
}

Accessing All Groups

fn main() {
    let re = FuzzyRegex::new(r"(\w+)@(\w+)").unwrap();
    let caps = re.captures("a@b").unwrap();

    assert_eq!(caps.len(), 3); // 0 (full) + 2 groups

    // Names of capture groups
    for name in caps.iter_names() {
        if let Some(n) = name {
            println!("Group name: {}", n);
        }
    }
}

Partial Matching

Match incomplete or streaming input.

Enabling Partial Matching

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)
        .build()
        .unwrap();
    
    println!("Created");
}

How It Works

When enabled, matches that reach the end of the input are marked as “partial”:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)
        .build()
        .unwrap();

    // Match at end of text - partial
    let m1 = re.find("hello").unwrap();
    assert!(m1.partial());

    // Match not at end - not partial
    let m2 = re.find("say hello").unwrap();
    assert!(!m2.partial());

    // Fuzzy match reaching end - also partial
    let m3 = re.find("hallo").unwrap();
    assert!(m3.partial());
}

Use Cases

Streaming Input

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:command){e<=1}")
        .partial(true)
        .build()
        .unwrap();

    let mut stream = re.stream();

    // Feed incomplete data
    for m in stream.feed(b"cmd") {
        if m.partial() {
            println!("Partial match: might be complete when more data arrives");
        }
    }
}

Progressive Matching

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)
        .build()
        .unwrap();

    // Check if more input needed
    let m = re.find("incomplete");
    if m.map(|m| m.partial()).unwrap_or(false) {
        // Get more input
    }
}

Without Partial

By default (partial(false)), partial matches behave the same as full matches:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(false) // default
        .build()
        .unwrap();

    let m = re.find("hello").unwrap();
    assert!(!m.partial()); // Always false
}

Word Lists

Define named lists of patterns for reuse.

Basic Usage

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let mut re = FuzzyRegex::new(r"\L<keywords>").unwrap();

    // Set the word list
    re.set_word_list(&["hello", "world", "test"]).unwrap();

    assert!(re.is_match("hello"));
    assert!(re.is_match("world"));
    assert!(!re.is_match("foo"));
}

With Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let mut re = FuzzyRegex::new(r"\L<keywords>{e<=1}").unwrap();
    re.set_word_list(&["hello", "world"]).unwrap();

    assert!(re.is_match("hallo")); // "hello" with 1 error
    assert!(re.is_match("wrold")); // "world" with 1 error
}

Multiple Word Lists

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"\L<names>|\L<places>").unwrap();
    // Note: Multiple lists require builder pattern
    println!("Created");
}

Use Cases

  • Keyword matching: Match against a list of keywords
  • Name matching: Match against database of names
  • Dictionary lookup: Match words from a dictionary

Custom Handlers

Custom handlers allow you to invoke arbitrary Rust code during regex matching. This is useful for patterns that are difficult or impossible to express with regular regex alone.

Syntax

Use (?call:handler_name) in your pattern to invoke a handler:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new("(?call:myhandler)")
    .handler("myhandler", |text, pos| {
        // Your logic here
    })
    .build()
    .unwrap();
}

Handler Signature

Handlers have the signature:

#![allow(unused)]
fn main() {
Fn(&str, usize) -> HandlerResult
}
  • text: The entire input string being matched
  • pos: Current position in the text (byte index)
  • HandlerResult::MatchOverride(n, text): Handler matched, consume n bytes, optionally override captured text
  • HandlerResult::NoMatch: Handler didn’t match at this position

Important: The byte count must account for UTF-8 encoding. For multi-byte characters (like Cyrillic, emoji), use byte length, not character count.

Examples

Simple Matching

Match strings that start with a specific prefix:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(?call:prefix)")
    .handler("prefix", |text, pos| {
        if text[pos..].starts_with("foo") {
            HandlerResult::MatchOverride(3, "FOO".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let m = re.find("foobar").unwrap();
assert_eq!(m.as_str(), "foo");
}

Capture Override

Handlers can override the captured text while still matching the original:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(prefix(?call:handler)suffix)")
    .handler("handler", |text, pos| {
        if text[pos..].starts_with("XYZ") {
            // Match 3 bytes but capture as lowercase
            HandlerResult::MatchOverride(3, "xyz".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let caps = re.captures("prefixXYZsuffix").unwrap();
// Captured text is "xyz" (the override), not "XYZ"
assert_eq!(caps.get(1).unwrap().as_str(), "prefixxyzsuffix");
}

Matching Escaped Characters

Match strings with escaped quotes:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r#""((?call:unescape)|.)*""#)
    .handler("unescape", |text, pos| {
        if pos + 1 < text.len() && text.as_bytes()[pos] == b'\\' {
            HandlerResult::MatchOverride(2, String::new()) // consume \ and next char
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let m = re.find(r#""hello \"world\"""#).unwrap();
assert_eq!(m.as_str(), "\"hello \\\"world\\\"\"");

let caps = re.captures(r#""hello \"world\"""#).unwrap();
// The handler_overrides contain the override information
println!("{:?}", caps.handler_overrides());
}

Escape Sequences

Match common escape sequences like \n, \t, \\:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(?call:escape)+")
    .handler("escape", |text, pos| {
        if pos + 1 < text.len() && text.as_bytes()[pos] == b'\\' {
            let next = text.as_bytes()[pos + 1];
            if matches!(next, b'n' | b't' | b'r' | b'\\' | b'"' | b'\'') {
                return HandlerResult::MatchOverride(2, String::new());
            }
        }
        HandlerResult::NoMatch
    })
    .build()
    .unwrap();

re.find(r"hello\nworld"); // matches "\n"
re.find(r"a\tb");         // matches "\t"
}

Context-Sensitive Matching

Match patterns that depend on context:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

// Match "id:123" only when preceded by "user:"
let re = FuzzyRegexBuilder::new(r"user:(?call:id)")
    .handler("id", |text, pos| {
        if text[pos..].starts_with("id:") {
            let rest = &text[pos + 3..];
            let mut count = 0;
            for (_, c) in rest.char_indices() {
                if c.is_ascii_digit() {
                    count += 1;
                } else {
                    break;
                }
            }
            if count > 0 {
                return HandlerResult::MatchOverride(3 + count, String::new());
            }
        }
        HandlerResult::NoMatch
    })
    .build()
    .unwrap();

re.find("user:id:123"); // matches "user:id:123"
re.find("user:name:foo"); // no match (not "id:")
re.find("id:999"); // no match (no "user:" prefix)
}

Unicode Text Transformation

Handle non-ASCII text (note: byte count matters for UTF-8):

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r#""(?call:translate)""#)
    .handler("translate", |text, pos| {
        let remaining = &text[pos..];
        // "привет" is 6 Cyrillic chars = 12 bytes in UTF-8
        if remaining.starts_with("привет") {
            HandlerResult::MatchOverride(12, "HELLO".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let caps = re.captures("\"привет\"").unwrap();
// The captured text shows the override
assert_eq!(caps.get(0).unwrap().as_str(), "\"HELLO\"");

// Handler overrides track (start_byte, end_byte, override_text)
assert_eq!(caps.handler_overrides(), &[(1, 13, "HELLO")]);
}

Fuzzy Matching with Handlers

Combine handlers with fuzzy matching for powerful patterns:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

// Fuzzy lookbehind + handler for complex validation
let re = FuzzyRegexBuilder::new(r"(?<=(?:hello){e<=2}) (?call:translate)")
    .handler("translate", |text, pos| {
        let remaining = &text[pos..];
        // "привет" = 12 bytes
        if remaining.starts_with("привет") {
            HandlerResult::MatchOverride(12, "HELLO".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

// Matches with fuzzy lookbehind
let m = re.find("hello привет").unwrap();
assert_eq!(m.as_str(), " привет");

// Also matches with fuzzy lookbehind (1 extra 'l')
let m = re.find("helllo привет").unwrap();
assert_eq!(m.as_str(), " привет");
}

Multiple Handlers

Use multiple handlers in the same pattern:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(?call:foo).*(?call:bar)")
    .handler("foo", |text, pos| {
        if text[pos..].starts_with("START") {
            HandlerResult::MatchOverride(5, "start".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .handler("bar", |text, pos| {
        if text[pos..].starts_with("END") {
            HandlerResult::MatchOverride(3, "end".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let caps = re.captures("START middle END").unwrap();
assert_eq!(caps.get(0).unwrap().as_str(), "START middle END");
}

Validation Handlers

Use handlers to implement complex validation rules:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

// Match email-like patterns but validate the domain
let re = FuzzyRegexBuilder::new(r"\w+@(?call:domain)")
    .handler("domain", |text, pos| {
        // Find the end of the domain (until whitespace or end)
        let remaining = &text[pos..];
        let mut end = 0;
        for (i, c) in remaining.char_indices() {
            if c.is_whitespace() {
                break;
            }
            end = i + c.len_utf8();
        }
        if end > 0 {
            let domain = &remaining[..end];
            // Only allow .com, .org, .net domains
            if domain.ends_with(".com") || domain.ends_with(".org") || domain.ends_with(".net") {
                return HandlerResult::MatchOverride(end, String::new());
            }
        }
        HandlerResult::NoMatch
    })
    .build()
    .unwrap();

re.find("user@example.com"); // matches
re.find("user@test.org");   // matches
re.find("user@fake.io");    // no match (invalid TLD)
}

Performance Notes

  • Handlers are called during NFA simulation, which may be slower than optimized paths
  • For best performance, keep handler logic simple
  • Consider using handlers only when necessary; standard regex is faster for expressible patterns
  • When using MatchOverride, the override text is stored separately and applied during capture construction

Limitations

  • Handlers cannot perform lookahead/lookbehind themselves
  • Handler matches are exact (not fuzzy) - they either match or don’t
  • Captures inside handlers are limited
  • Position is in bytes, not characters - account for UTF-8 encoding

Compatibility Layer

Migrate from other fuzzy matching libraries.

fuzzy-aho-corasick

This is a drop-in replacement for fuzzy-aho-corasick:

#![allow(unused)]
fn main() {
use fuzzy_regex::compat::fac::FuzzyAhoCorasickBuilder;
use fuzzy_regex::types::FuzzyLimits;

let searcher = FuzzyAhoCorasickBuilder::new()
    .fuzzy(FuzzyLimits::new().edits(1))
    .build(["hello", "world"])
    .unwrap();

for m in searcher.find_iter("helo wrld") {
    println!("Pattern {} matched at {}", m.pattern_index(), m.start());
}
}

Architecture Overview

How fuzzy-regex works internally.

High-Level Architecture

┌─────────────────────────────────────────────┐
│              FuzzyRegex                      │
│         (Public API Layer)                   │
│                                              │
│  ┌────────────────────────────────────────┐  │
│  │        Fast Path Optimizations         │  │
│  │  • Pure greedy .* → instant match     │  │
│  │  • .*SUFFIX → reverse search          │  │
│  │  • Word boundaries → direct search    │  │
│  └────────────────────────────────────────┘  │
└──────────────────┬──────────────────────────┘
                  │
         ┌────────┴──────────┐
         ▼                     ▼
┌───────────────┐   ┌─────────────────┐
│     DFA       │   │   NFA Engine   │
│  (Fast Path)  │   │  (Full Engine) │
│ + Capturing   │   │                │
│    Groups     │   │                │
└───────┬───────┘   └────────┬────────┘
        │                    │
        └────────┬───────────┘
                 ▼
        ┌─────────────────┐
        │  Fuzzy Bridge  │
        └────────┬────────┘
                 │
      ┌───────────┴───────────┐
      ▼                       ▼
┌──────────┐         ┌──────────┐
│  Bitap   │         │   D-L    │
│ Matcher  │         │   NFA    │
└──────────┘         └──────────┘

Components

1. Parser

  • Tokenizes regex patterns
  • Builds Abstract Syntax Tree (AST)
  • Handles fuzzy syntax extensions

2. Compiler

  • Converts AST to Intermediate Representation (IR)
  • Optimizes patterns
  • Handles fuzzy matching compilation

3. Fast Path Optimizations

Before going to the main engines, FuzzyRegex checks for special patterns:

  • Pure greedy dot-star: .*, ^.*$, .*$ return instantly
  • Greedy prefix with suffix: .*test uses reverse search (O(n) vs O(n²))
  • Word boundaries: Direct literal search with boundary checks
  • DFA compatibility: Patterns with capturing groups now use DFA

4. Matching Engines

DFA (Deterministic Finite Automaton)

  • Fast path for exact/non-fuzzy patterns
  • No backtracking
  • Supports capturing groups
  • Limited feature support

NFA (Non-deterministic Finite Automaton)

  • Full regex feature support
  • Backtracking engine
  • Supports fuzzy matching

Bitap Algorithm

  • Optimized for short patterns (≤64 chars)
  • O(n×k) time complexity
  • SIMD-accelerated

Damerau-Levenshtein NFA

  • Fuzzy matching via automata
  • Supports all edit types
  • Used for longer patterns

5. Fuzzy Bridge

  • Connects NFA to fuzzy matchers
  • Extracts literals for pre-filtering
  • Coordinates multiple matchers

Bitap Algorithm

Fast fuzzy matching algorithm for short patterns.

Overview

The Bitap algorithm (also known as the Shift-Or algorithm) is used for fuzzy matching of short patterns (≤64 characters). It achieves O(n×k) time complexity where n is text length and k is maximum edits.

How It Works

Bitap uses bitmasks to track which positions can match:

  1. Initialize: Create a bitmask for each character in the pattern
  2. Scan: For each text character, update the bitmask
  3. Match: When bitmask indicates a match, extract the match

Bitap State

#![allow(unused)]
fn main() {
// Simplified bitap state representation
struct BitapState {
    // Bits representing current match positions
    // For pattern "hello":
    // Position:    5   4   3   2   1   0
    // Bits:        0   0   0   0   0   1  = "h"
    //              0   0   0   0   1   0  = "e"
    //              ... continue ...
}
}

Key Properties

Pattern Length Limit

  • Maximum 64 characters (uses 64-bit integers)
  • Longer patterns use Damerau-Levenshtein NFA

Edit Distance

  • Supports insertions, deletions, substitutions
  • Transpositions handled specially

SIMD Acceleration

Modern implementations use SIMD for parallelism:

#![allow(unused)]
fn main() {
// SIMD processes multiple characters at once
// 256-bit AVX2: process 32 bytes per iteration
// 512-bit AVX512: process 64 bytes per iteration
}

Performance

Pattern LengthText SizeThroughput
14 chars200KB~1.5 Gbps
14 chars1MB~1.4 Gbps
14 chars, k=1200KB~2.0 Gbps
14 chars, k=3200KB~700 Mbps

When Bitap is Used

  • Pattern length ≤ 64 characters
  • Simple fuzzy matching (no complex regex features)
  • Exact matching benefits from optimizations

Implementation Details

State Machine

#![allow(unused)]
fn main() {
// Each iteration:
// 1. Shift state right by 1
// 2. OR with pattern mask for current char
// 3. Check for match (zero in certain bits)
}

Handling Errors

Bitap extends to handle errors by maintaining multiple state masks (one per allowed error):

#![allow(unused)]
fn main() {
// For k errors, maintain k+1 states
// state[0] = exact match
// state[1] = 1 error
// state[k] = k errors
}

Damerau-Levenshtein NFA

Fuzzy matching via finite automata for longer patterns.

Overview

For patterns longer than 64 characters or when complex regex features are needed, fuzzy-regex uses a Damerau-Levenshtein NFA. This is an extension of the classic Damerau-Levenshtein automaton.

Damerau-Levenshtein Automaton

The Damerau-Levenshtein automaton for a pattern P and max distance k accepts all strings within k edits of P.

State Representation

#![allow(unused)]
fn main() {
// Each state represents a position in the pattern
// plus the number of errors used so far
struct NfaState {
    pattern_position: usize,  // Position in pattern
    errors_used: usize,       // Errors consumed
}
}

Transitions

For each input character, the NFA can transition to:

  1. Match: Next character in pattern (0 errors)
  2. Insertion: Stay at current position (1 error)
  3. Deletion: Skip pattern character (1 error)
  4. Substitution: Move to next with 1 error
  5. Transposition: Swap next two characters (special case)

Building the NFA

#![allow(unused)]
fn main() {
// Pattern: "abc" with k=1
// States: (0,0) → (1,0) → (2,0) → (3,0)  [exact path]
//         (0,0) → (1,1) → ...             [with errors]
}

Construction Algorithm

  1. Create initial state at position 0
  2. For each pattern character, create states for:
    • Exact match (no error)
    • Insertion (adds error)
    • Deletion (adds error)
    • Substitution (adds error)
  3. Connect states with transitions
  4. Mark accepting states

Performance Characteristics

AspectDamerau-Levenshtein NFA
Time ComplexityO(n × k × m)
Space ComplexityO(k × m)
Pattern LengthUnlimited
FeaturesFull regex

Where:

  • n = text length
  • k = max edits
  • m = pattern length

Comparison with Bitap

FeatureBitapDamerau-Levenshtein NFA
Max Pattern Length64Unlimited
Time ComplexityO(n×k)O(n×k×m)
SIMD SupportYesLimited
Regex FeaturesLimitedFull
Memory UsageLowHigher

When Used

  • Pattern length > 64 characters
  • Complex regex features (lookahead, backreferences)
  • Alternation with fuzzy segments

Fuzzy Bridge

Connecting regex engine with fuzzy matching components.

Purpose

The Fuzzy Bridge coordinates between the NFA-based regex engine and specialized fuzzy matchers (Bitap, Damerau-Levenshtein NFA).

Responsibilities

  1. Literal Extraction: Extract exact-match literals from patterns for pre-filtering
  2. Search Coordination: Order and combine multiple matchers
  3. Result Merging: Combine and rank results from different matchers

Architecture

Regex Pattern:  hello (?:world){e<=1}
                    │
                    ▼
           ┌────────────────┐
           │ Fuzzy Bridge   │
           └───────┬────────┘
                   │
       ┌───────────┴───────────┐
       ▼                       ▼
┌──────────────┐      ┌──────────────┐
│    DFA       │      │   Fuzzy      │
│  (exact)     │      │   Matcher    │
│  "hello"     │      │ (Bitap/NFA)  │
└──────────────┘      │ "world"{e<=1}│
                      └──────────────┘

Literal Extraction

#![allow(unused)]
fn main() {
// Pattern: "prefix (?:target){e<=1} suffix"
// Extracted literals: ["prefix", "suffix"]
}

Extraction Strategy

  1. Identify fuzzy segments in pattern
  2. Extract surrounding exact literals
  3. Use literals for:
    • Pre-filtering text positions
    • Quick rejection of non-matching areas

Search Flow

Simple Fuzzy Pattern

For (?:hello){e<=1}:

  1. Use Bitap directly (pattern ≤64 chars)
  2. Or build Damerau-Levenshtein NFA (longer patterns)

Mixed Pattern

For exact (?:fuzzy){e<=1}:

  1. Use DFA to find “exact” quickly
  2. For each “exact” match, check surrounding for fuzzy match

Complex Pattern

For (?:a){e<=1}(?:b){e<=1}:

  1. Find all fuzzy matches for “a”
  2. Find all fuzzy matches for “b”
  3. Check if they can be combined

Optimization Techniques

Prefiltering

Use extracted literals to quickly skip non-matching regions:

#![allow(unused)]
fn main() {
// Text: "blah blah prefix target suffix blah"
// Literal "prefix" found at position X
// Only check fuzzy "target" around position X
}

Early Termination

Stop searching when match found (for find/first):

#![allow(unused)]
fn main() {
// First match found, don't continue
}

Caching

Cache compiled matchers for reuse:

#![allow(unused)]
fn main() {
// Bitap matcher cached for pattern
}

DFA Optimization

Fast path for exact and non-fuzzy patterns.

Deterministic Finite Automaton

The DFA provides the fastest matching path for patterns that don’t require fuzzy matching.

When DFA is Used

  • Exact match patterns (no fuzziness)
  • Simple character classes
  • Quantifiers without fuzzy segments

DFA Properties

PropertyValue
Time ComplexityO(n)
Space ComplexityO(states)
BacktrackingNone
MemoryDeterministic

Implementation

State Machine

#![allow(unused)]
fn main() {
// DFA state contains:
// - Transitions for each possible input
// - Whether state is accepting
// - Associated match information
}

Simulation

#![allow(unused)]
fn main() {
// For each input character:
// 1. Look up next state from current state
// 2. If no transition, stop (no match)
// 3. If accepting state, record match position
// 4. Continue until end of input
}

Fast Path Integration

Pattern: hello{e<=1}
              │
              ▼
    ┌─────────────────┐
    │ Has Fuzziness?  │
    └────────┬────────┘
             │
      ┌──────┴──────┐
      ▼             ▼
   Yes            No
     │             │
     ▼             ▼
   NFA/          DFA
   Bitap

Limitations

The DFA cannot handle:

  • Fuzzy matching
  • Lookahead/lookbehind
  • Backreferences
  • Capture groups (for position info)

For these, the NFA engine is used.

Optimization Techniques

State Minimization

Reduce number of states:

#![allow(unused)]
fn main() {
// Before: 100 states
// After minimization: 50 states
}

Lazy Evaluation

Build DFA states on-demand:

#![allow(unused)]
fn main() {
// Only build states that are actually traversed
}

Caching

Cache DFA for reuse:

#![allow(unused)]
fn main() {
// Reuse compiled DFA across searches
}

All-Matches Algorithms

The DFA supports multiple algorithms for finding all matches.

Three-Pass (Default)

The standard approach: find a match, advance, repeat.

#![allow(unused)]
fn main() {
let matches = dfa.find_all(text);
// Time: O(n²) worst case for pathological patterns
}

Two-Pass Algorithm

Uses a reverse prefilter to find candidates efficiently:

  1. Pass 1: Scan right-to-left, finding candidate positions
  2. Pass 2: Verify matches at each candidate
#![allow(unused)]
fn main() {
let matches = dfa.find_all_two_pass(text);
// Prefilter reduces candidates in Pass 1
// Still O(n²) worst case in Pass 2
}

Hardened Mode (O(n))

Tracks all active DFA states simultaneously:

  1. Maintain set of (state, start_position) pairs
  2. Process each character once, updating all states
  3. When no continuation possible, emit match
  4. Continue from next position
#![allow(unused)]
fn main() {
let matches = dfa.find_all_hardened(text);
// Guaranteed O(n) - each character processed once
}

Prefilter Types

TypeDescription
SingleBytememrchr for one byte
TwoBytesmemrchr for two bytes
ThreeBytesmemrchr for three bytes
TeddySIMD-accelerated multi-byte

When to Use

ScenarioRecommended
Few matches, well-behaved patternDefault
Prefilter effectiveTwo-pass
Pathological patternsHardened

Streaming Implementation

Processing data incrementally without loading everything into memory.

Streaming Architecture

Input Data
    │
    ▼
┌──────────────────┐
│  StreamingMatcher │
│  ┌──────────────┐ │
│  │ Input Buffer │ │
│  └──────────────┘ │
│         │         │
│         ▼         │
│  ┌──────────────┐ │
│  │   Matcher    │ │
│  └──────────────┘ │
└────────┬─────────┘
         │
         ▼
    Match Results

Key Components

Input Buffer

Stores unprocessed data between feed calls:

#![allow(unused)]
fn main() {
struct StreamingMatcher {
    buffer: Vec<u8>,      // Unprocessed bytes
    position: usize,      // Current position in stream
    matcher: Matcher,     // Underlying matcher
}
}

Position Tracking

Tracks absolute position in the stream:

#![allow(unused)]
fn main() {
let mut stream = re.stream();

stream.feed(b"hello");      // position = 5
stream.feed(b" world");     // position = 11

assert_eq!(stream.position(), 11);
}

Cross-Boundary Matching

Matches can span chunk boundaries:

#![allow(unused)]
fn main() {
let mut stream = re.stream();

// Chunk 1: no complete match
stream.feed(b"xxx ");  

// Chunk 2: completes the match
let matches = stream.feed(b"hello").collect();
// Match found spanning from chunk 1
}

Buffer Management

#![allow(unused)]
fn main() {
// Keep some overlap between chunks
// for cross-boundary matches

let mut stream = re.stream();

// Strategy: Keep last (pattern_len + max_errors) bytes
// This ensures cross-boundary matches are found
}

Reader Integration

Process any Read source:

#![allow(unused)]
fn main() {
use std::io::BufReader;
use std::fs::File;

let file = File::open("large_file.txt")?;
let reader = BufReader::new(file);

for m in re.stream().with_chunk_size(8192).search_reader(reader) {
    println!("Match at {}", m.start());
}
}

Byte-Level Streaming

Feed

#![allow(unused)]
fn main() {
stream.feed(b"data chunk");
}

Finish

#![allow(unused)]
fn main() {
// After all data fed, check for remaining matches
stream.finish();
}

Reset

#![allow(unused)]
fn main() {
// Start over with fresh state
stream.reset();
}

Performance Considerations Size

Chunk

Chunk SizeUse Case
Small (1KB)Low latency, interactive
Medium (8KB)Balanced
Large (64KB+)High throughput

Memory Usage

Streaming uses O(max_pattern_length + buffer) memory regardless of input size.

Implementation Details

State Management

#![allow(unused)]
fn main() {
// Between feed calls:
// 1. Save current state
// 2. Process new chunk
// 3. Keep partial state for next call

struct State {
    nfa_state: Option<NfaState>,
    bitap_state: u64,
    // etc.
}
}

Partial Results

Report matches as they’re found:

#![allow(unused)]
fn main() {
// Yield matches immediately as they're found
// Don't wait for complete input
}

Algorithm Selection

How fuzzy-regex chooses the matching algorithm.

Decision Tree

Pattern Analysis
      │
      ▼
┌──────────────────┐
│ Is exact match?  │
└────────┬─────────┘
          │
      ┌────┴────┐
      ▼         ▼
     Yes       No
       │         │
       ▼         ▼
    DFA    ┌────────────────┐
           │ Pure greedy .*? │
           └────────┬────────┘
                    │
               ┌────┴────┐
               ▼         ▼
              Yes       No
               │         │
               ▼         ▼
         Instant      ┌────────────────┐
          Match      │ Pattern ≤64?   │
                     └────────┬────────┘
                              │
                         ┌────┴────┐
                         ▼         ▼
                        Yes       No
                         │         │
                         ▼         ▼
                       Bitap    Damerau-Levenshtein
                                  NFA

Algorithm Comparison

AlgorithmUse CaseComplexityFeatures
DFAExact patterns, capturing groupsO(n)Limited
Instant MatchPure greedy .*O(1)Limited
BitapShort fuzzy (≤64 chars)O(n×k)Most
Damerau-Levenshtein NFALong fuzzy patternsO(n×k×m)Full

Automatic Selection

The library automatically selects based on:

  1. Pattern length: Bitap for ≤64 chars
  2. Fuzzy complexity: Cost-based vs simple
  3. Regex features: DFA can’t do lookahead
  4. Streaming: Different code path
  5. Greedy patterns: Instant match for .*, ^.*$, .*$
  6. Greedy suffix patterns: .*SUFFIX uses reverse search

Special Optimizations

Pure Greedy Dot-Star

Patterns like .*, ^.*$, .*$ always match. The engine detects these and returns instantly without scanning:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*").unwrap();
// Returns match immediately - no text scanning
}

Greedy Prefix with Suffix

Patterns like .*test or .*test~2 use reverse search to avoid O(n²) behavior:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*test").unwrap();
// Finds "test" from the right, then .* matches everything before it
// O(n) instead of O(n²)
}

DFA with Capturing Groups

DFA now works with capturing groups like (?m)^(.*)test$:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new("(?m)^(.*)test$").unwrap();
// Uses DFA - much faster than NFA
}

All-Matches Algorithms

When finding all matches in text (find_iter, find_all), fuzzy-regex supports multiple algorithms optimized for different patterns.

The Problem

Some patterns produce many overlapping matches, causing naive “find, advance, repeat” to be O(n²):

#![allow(unused)]
fn main() {
// Pattern: .*a|b on text: "bbbbbbbb"
// Each 'b' matches individually → O(n) matches × O(n) scan = O(n²)
}

Available Algorithms

AlgorithmComplexityBest For
find_allO(n²) worst caseSimple patterns, few matches
find_all_two_passO(n²) worst caseWhen prefilter is effective
find_all_hardenedO(n) guaranteedPathological patterns

Two-Pass Algorithm

Uses a reverse prefilter to find candidate positions, then verifies matches:

  1. Pass 1: Scan right-to-left using prefilter (memchr, Teddy, etc.)
  2. Pass 2: For each candidate, find the match
#![allow(unused)]
fn main() {
let mut dfa = Dfa::new(".*a|b").unwrap();
let matches = dfa.find_all_two_pass("bbbbbb");
}

Hardened Mode (O(n))

Tracks all active DFA states simultaneously as we scan left-to-right:

#![allow(unused)]
fn main() {
let mut dfa = Dfa::new(".*a|b").unwrap();
let matches = dfa.find_all_hardened("bbbbbb");
}

How it works:

  1. Start with one state at current position
  2. Process all characters, tracking active states
  3. When no state can continue, emit the leftmost match
  4. Move to next position, repeat

Benchmark: Pathological Pattern

Pattern .*a|b on text of all ’b’s (worst case):

Text Sizefind_alltwo_passhardenedSpeedup
1,000 bytes1.08s1.10s69ms15x
5,000 bytes5.47s5.43s69ms79x
10,000 bytes10.76s10.85s69ms156x

When to Use Each

#![allow(unused)]
fn main() {
// Default: smart selection based on pattern
let matches = dfa.find_all(text);

// Explicit two-pass: good when prefilter is effective
let matches = dfa.find_all_two_pass(text);

// Explicit hardened: critical for pathological patterns
let matches = dfa.find_all_hardened(text);
}

Prefilter Types

The two-pass algorithm uses prefilters optimized for different patterns:

PrefilterUse CaseMethod
SingleByteSingle literalmemrchr
TwoBytesTwo literalsmemrchr for both
ThreeBytesThree literalsmemrchr for all
TeddyMany alternativesSIMD range matching

Manual Override

Not currently exposed, but internal selection can be inspected:

#![allow(unused)]
fn main() {
// Check which engine was used
let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
if re.supports_streaming() {
    // Using Bitap
}
}

SIMD Optimizations

Using vector instructions for faster matching.

SIMD Overview

SIMD (Single Instruction, Multiple Data) allows processing multiple bytes simultaneously.

SIMD WidthBytes/IterationArchitecture
128-bit16 bytesSSE4.1, NEON
256-bit32 bytesAVX2
512-bit64 bytesAVX512

SIMD Components

Character Range Search (RevSearchRanges)

Vectorized reverse search for character ranges like [a-z]:

  • x86_64: Uses SSE/AVX2 for 16/32-byte iteration
  • ARM (NEON): Uses 16-byte vectors with vmaxvq_u8
  • Uses neon_movemask to extract bit positions
#![allow(unused)]
fn main() {
// Searches right-to-left for characters in range
let positions = rev_search_ranges.find_last(haystack);
}

SIMD-accelerated multi-pattern searching:

  • Processes 16 bytes per iteration (32 on AVX2)
  • Checks 4 character ranges simultaneously
  • Uses bitwise operations to combine matches
  • Efficient for patterns with multiple byte alternatives
#![allow(unused)]
fn main() {
// Searches for any of multiple bytes
let position = teddy.find_first(haystack);
}

Bitap SIMD

The Bitap algorithm is highly parallelizable with SIMD:

#![allow(unused)]
fn main() {
// Without SIMD: 1 byte per iteration
// With AVX2: 32 bytes per iteration
// ~32x speedup potential
}

Implementation

#![allow(unused)]
fn main() {
// AVX2 implementation
unsafe {
    // Load 32 bytes
    let data = _mm256_loadu_si256(ptr.as_ptr());
    
    // Apply bitap operations to all 32 simultaneously
    // ...
}
}

Performance Impact

ConfigurationThroughput
Scalar (no SIMD)~50 MB/s
SIMD (AVX2)~180 MB/s
Speedup~3.6x

Enabling SIMD

SIMD is enabled by default. To disable:

[dependencies]
fuzzy-regex = { version = "0.1", default-features = false }

Platform Support

x86_64

  • SSE4.1: Minimum for SIMD path
  • AVX2: Default, 32-byte vectors
  • AVX512: Optional, 64-byte vectors

ARM (Apple Silicon)

  • NEON: 16-byte vectors
  • Auto-detected: Runtime detection for M1/M2/M3

Detection

Runtime detection automatically uses best available:

#![allow(unused)]
fn main() {
// At compile time: #[target_feature(enable = "avx2")]
// At runtime: CPU feature detection
}

Requirements

SIMD requires:

  1. CPU support (SSE4.1+ or NEON)
  2. Compiler with SIMD intrinsics
  3. Pattern length suitable for Bitap

Pattern [a-z]+ on 100KB text:

ImplementationTime
Scalar45.2 μs
NEON6.8 μs
AVX24.1 μs
Speedup6-10x

Performance Tips

Optimize fuzzy-regex for your use case.

Pattern Design

1. Use Specific Edit Limits

fn main() {
    // Good: Specific limit
    let _ = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    // Less efficient: Higher limit
    let _ = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=5}").unwrap();
    
    println!("Done");
}

Lower edit limits = faster matching.

2. Prefer Shorter Patterns

fn main() {
    // Bitap (fast): ≤64 chars
    let _ = fuzzy_regex::FuzzyRegex::new("(?:short){e<=1}").unwrap();

    // NFA (slower): >64 chars
    let _ = fuzzy_regex::FuzzyRegex::new("(?:very_long_pattern_that_exceeds_sixty_four_characters){e<=1}").unwrap();
    
    println!("Done");
}

3. Extract Exact Parts

fn main() {
    // Good: Exact prefix and suffix help prefilter
    let _ = fuzzy_regex::FuzzyRegex::new("exact_prefix (?:fuzzy){e<=1} exact_suffix").unwrap();

    // Slower: Entirely fuzzy
    let _ = fuzzy_regex::FuzzyRegex::new("(?:entirely_fuzzy){e<=1}").unwrap();
    
    println!("Done");
}

4. Use Greedy Suffix Patterns

fn main() {
    // Good: .*SUFFIX is optimized with reverse search
    let _ = fuzzy_regex::FuzzyRegex::new(".*test").unwrap();
    let _ = fuzzy_regex::FuzzyRegex::new(".*test~1").unwrap();
    
    // Also works with anchors
    let _ = fuzzy_regex::FuzzyRegex::new("^.*test$").unwrap();
    
    println!("Done");
}

Patterns like .*test automatically use reverse search to find the suffix first, then match everything before it. This is O(n) instead of O(n²).

Builder Options

1. Set Similarity Threshold

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // Skip low-quality matches early
    let _ = FuzzyRegexBuilder::new("(?:hello){e<=2}")
        .similarity(0.8)
        .build();
    
    println!("Done");
}

2. Use Case Insensitive at Builder

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // More efficient than inline (?i)
    let _ = FuzzyRegexBuilder::new("(?:hello)")
        .case_insensitive(true)
        .build();
    
    println!("Done");
}

API Usage

1. Use Streaming for Large Data

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    
    // Good: Process in chunks
    let mut stream = re.stream();
    let data = b"hello world";
    for chunk in data.chunks(8) {
        // Process chunk
    }

    // Bad: Load all into memory
    let large_text = "hello world";
    let _matches: Vec<_> = re.find_iter(&large_text).collect();
}

2. Use find() for First Match

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let text = "hello world";

    // Good: Stop after first match
    if let Some(m) = re.find(text) {
        println!("Found: {}", m.as_str());
    }

    // Unnecessary: Find all when only first needed
    let _all: Vec<_> = re.find_iter(text).collect();
}

3. Check supports_streaming()

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    
    if re.supports_streaming() {
        // Use streaming API for best performance
        let mut stream = re.stream();
        println!("Streaming supported");
    }
}

Build Configuration

1. Release Mode

cargo build --release

2. LTO

[profile.release]
lto = true
codegen-units = 1

3. SIMD

Enabled by default. Ensure target CPU supports it.

Common Pitfalls

IssueSolution
Slow with high editsLower edit limit
High memory usageUse streaming
Slow on long textUse exact prefix
Slow compilationEnable LTO

Pathological Patterns

Some regex patterns can cause O(n²) behavior in naive implementations:

#![allow(unused)]
fn main() {
// Pattern: .*a|b on text of all 'b's
// Each 'b' matches individually
// Naive: O(n) matches × O(n) scan = O(n²)
}

When It Happens

  • Alternation with wildcards: .*a|b, (a|b)+
  • Overlapping matches: Many ways to match the same text
  • Backtracking patterns: Complex alternation

Solution: Hardened Mode

Use find_all_hardened() for O(n) guaranteed performance:

#![allow(unused)]
fn main() {
use fuzzy_regex::FuzzyRegex;

let re = FuzzyRegex::new(".*a|b").unwrap();
let text = "bbbbbbbbbbbbbbbb";

// Hardened mode: O(n) guaranteed
let matches = re.find_all_hardened(text);
}

Performance Comparison

Text SizeStandardHardened
1,000 bytes1.08s69ms
10,000 bytes10.76s69ms

The hardened mode maintains constant time regardless of text size.

Trade-offs

Hardened mode may be slightly slower for well-behaved patterns (where O(n²) doesn’t occur), but it’s the safest choice when:

  • Pattern behavior is unknown
  • Text comes from untrusted sources
  • Worst-case performance is critical

Pattern Syntax Reference

Complete reference for all pattern syntax.

Special Characters

CharacterDescription
.Any character except newline
^Start of string
$End of string
``
\Escape
(, )Groups
[, ]Character class
{, }Quantifier or fuzziness

Character Classes

SyntaxDescription
[abc]a, b, or c
[^abc]Not a, b, or c
[a-z]a to z
[A-Za-z]ASCII letters
[0-9]Digits
\dDigit
\DNon-digit
\wWord character
\WNon-word
\sWhitespace
\SNon-whitespace

Quantifiers

SyntaxDescription
*0 or more
+1 or more
?0 or 1
{n}Exactly n
{n,}n or more
{n,m}n to m
*?Lazy *
+?Lazy +
??Lazy ?

Fuzziness Markers

SyntaxDescription
{e<=N}Max N edits
{i<=N}Max N insertions
{d<=N}Max N deletions
{s<=N}Max N substitutions
{t<=N}Max N transpositions
{c<=N}Max total cost N
{Ni+Md...<=N}Weighted costs
{e<=N:[class]}Restricted edits
~NShorthand for {e<=N}

Groups

SyntaxDescription
(...)Capture group
(?:...)Non-capture
(?<name>...)Named capture
(?=...)Lookahead
(?!...)Negative lookahead
(?<=...)Lookbehind
(?<!...)Negative lookbehind
(?>...)Atomic group

Flags

SyntaxDescription
(?i)Case insensitive
(?m)Multi-line
(?s)Dot-all
(?x)Verbose
(?U)Ungreedy
(?b)Best match
(?e)Enhance match
(?p)POSIX mode

Error Messages

Understanding and handling errors.

Pattern Errors

Invalid Syntax

Error: regex parse error

Cause: Malformed regex pattern.

Fix: Check for unclosed groups, invalid quantifiers, etc.

Unsupported Feature

Error: Feature not supported: backreferences

Cause: Pattern uses feature not implemented.

Fix: Simplify pattern or restructure.

Empty Pattern

Error: pattern cannot be empty

Cause: Empty regex not allowed.

Fix: Use non-empty pattern.

Compilation Errors

Fuzzy Limit Out of Range

Error: edit limit too high

Cause: Edit limit exceeds maximum.

Fix: Reduce limit.

Invalid Character Class

Error: invalid character class

Cause: Malformed character class.

Fix: Check class syntax.

Runtime Errors

Timeout

Error: timeout after ...

Cause: Matching took too long.

Fix: Simplify pattern, reduce edit limit.

No Match

Not an error - use find() returning Option.

#![allow(unused)]
fn main() {
if let Some(m) = re.find(text) {
    // Match found
} else {
    // No match
}
}

Migration Guide

Migrating from other libraries to fuzzy-regex.

From regex (standard)

fuzzy-regex extends the standard regex crate with fuzzy matching:

fn main() {
    // Standard regex
    let re = regex::new("hello").unwrap();

    // fuzzy-regex
    let re = fuzzy_regex::FuzzyRegex::new("hello").unwrap();

    // Add fuzziness
    let re = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    
    println!("Created");
}

From fuzzy-aho-corasick

See Compatibility Layer for a drop-in replacement.

From mrab-regex

The fuzzy syntax is compatible:

fn main() {
    // mrab-regex
    let _re = regex::new(r"(?i)(?:hello){e<=1}");

    // fuzzy-regex (same syntax)
    let re = fuzzy_regex::FuzzyRegex::new(r"(?i)(?:hello){e<=1}").unwrap();
    
    println!("Created");
}