Introduction

fuzzy-regex is a high-performance fuzzy regular expression engine written in Rust. It combines traditional regex constructs with approximate string matching using Damerau-Levenshtein automata and the Bitap algorithm.

Key Features

Fuzzy Matching: Match strings with configurable edit distance tolerance (insertions, deletions, substitutions, transpositions)
Full Regex Support: Character classes, quantifiers, groups, alternation, anchors, lookahead/lookbehind
Per-Segment Fuzziness: Control fuzziness for individual parts of a pattern
Shared Edit Budgets: Constrain total edits across multi-piece expressions using group-level fuzziness
Capture Groups: Named and numbered capture groups with fuzzy matching
Similarity Scoring: Get match quality scores (0.0 - 1.0)
Streaming API: Process large files and network streams incrementally
High Performance: Bitap algorithm for patterns ≤64 chars, SIMD optimizations
Unicode Support: Full Unicode support including case-insensitive matching

Why Fuzzy Regex?

Traditional regex requires exact matches. Fuzzy regex allows for “close enough” matches:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" with up to 1 edit
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    println!("{}", re.is_match("hello")); // Exact match
    println!("{}", re.is_match("hallo")); // 1 substitution
    println!("{}", re.is_match("helo")); // 1 deletion  
    println!("{}", re.is_match("helllo")); // 1 insertion
    println!("{}", re.is_match("hlelo")); // 1 transposition
}

Performance

fuzzy-regex is designed for high performance:

Bitap Algorithm: O(n×k) time complexity for patterns ≤64 characters
SIMD Optimizations: ~2-10x speedup on supported platforms (AVX2, NEON)
find_iter Fast Paths: Specialized Aho-Corasick and literal search for iterating matches
Hardened Mode: Guaranteed O(n) worst-case for pathological patterns
Streaming: Process gigabytes of data without loading into memory

Typical throughput: 1.4-2.0 Gbps for streaming fuzzy matching on modern hardware.

Pathological Pattern Protection

Some regex patterns can cause O(n²) behavior in naive implementations. fuzzy-regex includes hardened mode that guarantees O(n) performance even for patterns like .*a|b:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*a|b").unwrap();
// Guaranteed O(n) even on pathological patterns
let matches = re.find_all_hardened(text);
}

Ecosystem

MSRV: Rust 1.88+
Features: SIMD (default), mimalloc (optional)
Dependencies: Zero runtime dependencies

Installation

Add fuzzy-regex to your Cargo.toml:

[dependencies]
fuzzy-regex = "0.1"

Feature Flags

simd (default): Enable SIMD optimizations for faster matching on x86_64 and ARM
mimalloc: Use mimalloc allocator for better performance in memory-intensive workloads

[dependencies]
fuzzy-regex = { version = "0.1", default-features = false }
# Or with mimalloc
fuzzy-regex = { version = "0.1", features = ["simd", "mimalloc"] }

MSRV

The minimum supported Rust version is 1.88.

Quick Verification

use fuzzy_regex::FuzzyRegex;

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    println!("{}", re.is_match("hallo")); // true
}

Quick Start

Basic Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow up to 2 edits
    let re = FuzzyRegex::new("(?:hello){e<=2}").unwrap();

    assert!(re.is_match("hello"));   // Exact match
    assert!(re.is_match("helo"));    // 1 deletion
    assert!(re.is_match("helllo"));  // 1 insertion
    assert!(re.is_match("hallo"));   // 1 substitution
    assert!(re.is_match("hlelo"));   // 1 transposition
}

Using the Builder

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:teh){e<=1}")
        .similarity(0.7)         // Minimum similarity score
        .case_insensitive(true)  // Case-insensitive matching
        .build()
        .unwrap();

    let m = re.find("I saw teh cat").unwrap();
    assert_eq!(m.as_str(), "teh");
    assert!(m.similarity() >= 0.7);
}

Finding Matches

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?:\w+){e<=1}").unwrap();

    // Find first match
    let m = re.find("This is a tset").unwrap();
    assert_eq!(m.as_str(), "tset");

    // Find all matches
    let matches: Vec<_> = re.find_iter("test tset tsat").collect();
    assert_eq!(matches.len(), 3);
}

Streaming Large Data

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    // Process in chunks
    for m in stream.feed(b"some hay and niddle here") {
        println!("Found: {} at {}", m.as_str(), m.start());
    }

    // Check position
    assert_eq!(stream.position(), 24);
}

Fuzzy Matching Basics

Fuzzy matching allows matching strings that are “close” to the pattern, not just exact matches. The closeness is measured using edit distance.

Edit Distance

Edit distance is the minimum number of character-level operations needed to transform one string into another:

Insertion: Add a character (cost: 1)
Deletion: Remove a character (cost: 1)
Substitution: Replace one character with another (cost: 1)
Transposition: Swap two adjacent characters (cost: 1)

Example: “hello” → “hallo” (1 substitution)

Fuzziness Markers

Apply fuzziness to a pattern segment using {...}:

Syntax	Description
`(?:text){e<=N}`	Allow up to N total edits
`(?:text){i<=N}`	Allow up to N insertions
`(?:text){d<=N}`	Allow up to N deletions
`(?:text){s<=N}`	Allow up to N substitutions
`(?:text){t<=N}`	Allow up to N transpositions

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow up to 2 total edits
    let re1 = FuzzyRegex::new("(?:hello){e<=2}").unwrap();

    // Allow specific edit types
    let re2 = FuzzyRegex::new("(?:hello){i<=1,d<=1}").unwrap();

    // Allow substitutions only
    let re3 = FuzzyRegex::new("(?:hello){s<=2}").unwrap();
    
    println!("re1: {:?}", re1.is_match("hello"));
    println!("re2: {:?}", re2.is_match("helo"));
    println!("re3: {:?}", re3.is_match("hallo"));
}

Shorthand Syntax

Use ~N as shorthand for {e<=N}:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // These are equivalent:
    let re1 = FuzzyRegex::new("(?:hello)~2").unwrap();
    let re2 = FuzzyRegex::new("(?:hello){e<=2}").unwrap();

    // Exact match with ~0
    let re3 = FuzzyRegex::new("(?:hello)~0").unwrap();
    
    println!("re1 == re2: {}", re1.is_match("hello") == re2.is_match("hello"));
    println!("re3 exact: {}", re3.is_match("hello"));
}

Unlimited Errors

Omit the number to allow unlimited edits:

fn main() {
    // Allow unlimited substitutions
    let re = FuzzyRegex::new("(?:hello){s}").unwrap();

    // Allow any number of errors
    let re2 = FuzzyRegex::new("(?:hello){e}").unwrap();
    
    println!("{}", re.is_match("hallo"));
    println!("{}", re2.is_match("xyz"));
}

Edit Distance

The edit distance (Damerau-Levenshtein distance) measures how many operations it takes to transform the pattern into the matched text.

Edit Types

Insertion

Adding a character to the text:

Pattern: "cat"
Text:    "cart" 
Edit:    Insert 'r' after 'c'
Cost:    1

Deletion

Removing a character from the text:

Pattern: "cat"  
Text:    "ct"
Edit:    Delete 'a'
Cost:    1

Substitution

Replacing one character with another:

Pattern: "cat"
Text:    "bat"
Edit:    Replace 'c' with 'b'
Cost:    1

Transposition

Swapping two adjacent characters:

Pattern: "cat"
Text:    "act"
Edit:    Swap 'c' and 'a'
Cost:    1

Accessing Edit Information

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=2}").unwrap();
    let m = re.find("hallo").unwrap();

    println!("Similarity: {:.2}", m.similarity());
    println!("Edits: {}", m.len()); // Match length

    // Edit counts available in the engine
    // Use engine::EditCounts for detailed info
}

Similarity Score

The similarity score ranges from 0.0 to 1.0, where 1.0 is an exact match:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=2}")
        .similarity(0.8) // Minimum similarity threshold
        .build()
        .unwrap();

    // Exact match: similarity = 1.0
    // 1 edit: similarity around 0.8-0.9
    // 2 edits: similarity around 0.6-0.8
    println!("{}", re.is_match("hello"));
    println!("{}", re.is_match("hallo"));
}

Fuzziness Markers

Detailed syntax for controlling fuzzy matching.

Basic Markers

Marker	Description	Example
`{e<=N}`	Total edits ≤ N	`{e<=2}`
`{i<=N}`	Insertions ≤ N	`{i<=1}`
`{d<=N}`	Deletions ≤ N	`{d<=1}`
`{s<=N}`	Substitutions ≤ N	`{s<=1}`
`{t<=N}`	Transpositions ≤ N	`{t<=1}`

Combining Markers

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow 1 insertion AND 1 deletion
    let re1 = FuzzyRegex::new("(?:hello){i<=1,d<=1}").unwrap();

    // Allow up to 2 substitutions OR up to 1 deletion (combined constraint)
    let re2 = FuzzyRegex::new("(?:hello){s<=2,d<=1}").unwrap();

    // Each constraint is independent
    // The match must satisfy ALL specified constraints
    println!("re1 matches 'helo': {}", re1.is_match("helo"));
    println!("re2 matches 'hallo': {}", re2.is_match("hallo"));
}

Character Class Restrictions

Restrict which characters can be used for edits:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Only allow substitutions with lowercase letters
    let re1 = FuzzyRegex::new(r"(?:hello){s<=1:[a-z]}").unwrap();
    assert!(re1.is_match("hallo"));  // 'a' is in [a-z]
    assert!(!re1.is_match("h3llo")); // '3' is not in [a-z]

    // Only allow insertions of digits
    let re2 = FuzzyRegex::new(r"(?:hello){i<=1:[0-9]}").unwrap();

    // Only allow substitutions with whitespace
    let re3 = FuzzyRegex::new(r"(?:hello){s<=1:\s}").unwrap();
}

Min/Max Error Ranges

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Require at least 1 edit, allow up to 2
    let re = FuzzyRegex::new("(?:hello){e>=1,e<=2}").unwrap();

    // Minimum errors with shorthand
    let re2 = FuzzyRegex::new("(?:hello){1e<=2}").unwrap();
    
    println!("re matches 'hello': {}", re.is_match("hello"));
    println!("re matches 'hallo': {}", re.is_match("hallo"));
}

Editing Classes

Apply fuzziness to specific character classes:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Apply different limits to different parts
    let re = FuzzyRegex::new("(?:hello){e<=1} world{e<=1}").unwrap();
    println!("{}", re.is_match("helo worled"));
}

Shared Group Budget

When a non-capturing group has its own fuzziness, the edit budget is shared across all pieces inside it:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Shared budget: total edits across "hello" and "world" combined ≤ 2
    let re = FuzzyRegex::new("(?:hello world){e<=2}").unwrap();
    // "hello" and "world" draw from the same pool of 2 edits

    assert!(re.is_match("hello world"));   // 0 edits
    assert!(re.is_match("helo world"));    // 1 deletion (hello) → 1 remaining
    assert!(!re.is_match("helo worl"));    // 1 deletion each → 2 total, still OK
    assert!(!re.is_match("hlo wrld"));     // 3 deletions → exceeds budget

    // Per-type limits also shared across the group
    let re2 = FuzzyRegex::new("(?:hello world){i<=1,d<=1}").unwrap();
    // Maximum 1 insertion AND 1 deletion across all pieces
    
    assert!(re2.is_match("helo world"));   // 1 deletion
    assert!(!re2.is_match("helo worl"));   // 2 deletions → exceeds budget
}

This ensures that one piece doesn’t exhaust the shared budget, giving predictable results for multi-segment fuzzy patterns.

Cost-Based Matching

Assign different costs to edit operations for fine-grained control.

Simple Cost Constraint

Use {c<=N} to limit total cost (all operations cost 1):

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Total cost ≤ 2 (any combination of edits, each costing 1)
    let re = FuzzyRegex::new("(?:hello){c<=2}").unwrap();

    assert!(re.is_match("hello")); // 0 edits, cost=0
    assert!(re.is_match("hallo")); // 1 sub, cost=1
    assert!(re.is_match("helo"));  // 1 del, cost=1
    assert!(re.is_match("hhello")); // 1 ins, cost=1
    assert!(re.is_match("hallo")); // 1 sub, cost=1
}

Weighted Costs

Assign different costs to different edit types:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Insertions cost 2, others cost 1, total ≤ 3
    let re = FuzzyRegex::new("(?:ab){2i+1d+1s+1t<=3}").unwrap();

    assert!(re.is_match("abc")); // 1 insertion (cost=2)
    assert!(re.is_match("a"));   // 1 deletion (cost=1)
    assert!(re.is_match("ba"));  // 1 transposition (cost=1)
    assert!(!re.is_match("aabc")); // 2 insertions (cost=4) - exceeds limit
}

Cost Syntax

Syntax	Description
`{c<=N}`	Total cost ≤ N
`{c<N}`	Total cost < N
`{Ni+Md+St+Tt<=N}`	Custom costs for each type
`{0i+...}`	Free insertions
`{Ni...}`	Insertions cost N

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // All operations cost 1: {1i+1d+1s+1t<=N} = {c<=N}
    let re1 = FuzzyRegex::new("(?:test){c<=2}").unwrap();

    // Insertions are expensive (typing errors less common)
    let re2 = FuzzyRegex::new("(?:test){3i+1d+1s<=3}").unwrap();

    // Insertions are free (OCR errors)
    let re3 = FuzzyRegex::new("(?:test){0i+1d+1s<=2}").unwrap();
    
    println!("re1: {}", re1.is_match("test"));
    println!("re2: {}", re2.is_match("test"));
    println!("re3: {}", re3.is_match("tset"));
}

Use Cases

Typing errors: Substitutions/deletions more common than insertions
OCR errors: Insertions more common (extra characters)
Genetic sequences: Different scoring matrices

Pattern Syntax

fuzzy-regex supports standard regex syntax plus fuzzy matching.

Quick Reference

Syntax	Description
`a`	Literal character
`.`	Any character except newline
`\d`	Digit [0-9]
`\D`	Non-digit
`\w`	Word character [a-zA-Z0-9_]
`\W`	Non-word character
`\s`	Whitespace
`\S`	Non-whitespace
`[abc]`	Character class
`[^abc]`	Negated class
`[a-z]`	Range
`^`	Start of string
`$`	End of string
`\b`	Word boundary
`\B`	Non-word boundary

Escaping

Special characters that need escaping: .^$*+?{}[]\\|()

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match literal dot
    let re = FuzzyRegex::new(r"file\.txt").unwrap();

    // Match literal backslash
    let re2 = FuzzyRegex::new(r"path\\to\\file").unwrap();
    
    println!("dot match: {}", re.is_match("file.txt"));
    println!("backslash match: {}", re2.is_match(r"path\to\file"));
}

Character Classes

Define sets of characters to match.

Basic Classes

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match any vowel
    let re1 = FuzzyRegex::new("[aeiou]").unwrap();

    // Match any consonant
    let re2 = FuzzyRegex::new("[^aeiou]").unwrap();

    // Match any lowercase letter
    let re3 = FuzzyRegex::new("[a-z]").unwrap();

    // Match any alphanumeric
    let re4 = FuzzyRegex::new("[a-zA-Z0-9]").unwrap();
    
    println!("vowel: {}", re1.is_match("a"));
    println!("consonant: {}", re2.is_match("b"));
}

Predefined Classes

By default, \w, \d, \s match ASCII characters only:

Escape	ASCII Only	Unicode (with `(?u)`
`\d`	`[0-9]`	Unicode digits
`\D`	`[^0-9]`	Non-digit
`\w`	`[a-zA-Z0-9_]`	Unicode word chars
`\W`	`[^a-zA-Z0-9_]`	Non-word
`\s`	`[ \t\n\r\f]`	Unicode whitespace
`\S`	`[^ \t\n\r\f]`	Non-whitespace

Unicode Mode

Enable Unicode character classes with (?u) flag:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // Via inline flag
    let re1 = FuzzyRegex::new(r"(?u)\w+").unwrap();
    assert!(re1.is_match("привет")); // Cyrillic matched

    // Via builder
    let re2 = FuzzyRegexBuilder::new(r"\w+")
        .unicode(true)
        .build()
        .unwrap();
}

Unicode Property Escapes

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Unicode letters
    let re1 = FuzzyRegex::new(r"\p{L}").unwrap();

    // Unicode digits
    let re2 = FuzzyRegex::new(r"\p{N}").unwrap();

    // Emoji
    let re3 = FuzzyRegex::new(r"\p{Emoji}").unwrap();
    
    println!("letter: {}", re1.is_match("a"));
    println!("digit: {}", re2.is_match("١")); // Arabic-Indic digit
}

Nested Classes

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // POSIX-like classes (inside character class)
    let re = FuzzyRegex::new("[[:alpha:]]").unwrap();
    println!("alpha: {}", re.is_match("a"));
}

Quantifiers

Control how many times to match a pattern.

Basic Quantifiers

Syntax	Description	Example
`*`	Zero or more	`ab*` matches “a”, “ab”, “abb”
`+`	One or more	`ab+` matches “ab”, “abb”
`?`	Zero or one	`ab?` matches “a”, “ab”
`{n}`	Exactly n	`a{3}` matches “aaa”
`{n,}`	At least n	`a{2,}` matches “aa”, “aaa”
`{n,m}`	Between n and m	`a{2,4}` matches “aa”, “aaa”, “aaaa”

Lazy Quantifiers

Add ? to make quantifiers lazy (match as few as possible):

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Greedy: matches as much as possible
    let re1 = FuzzyRegex::new(r"<.+>").unwrap();

    // Lazy: matches as little as possible  
    let re2 = FuzzyRegex::new(r"<.+?>").unwrap();

    let text = "<tag>more</tag>";
    assert!(re1.find(text).unwrap().as_str() == "<tag>more</tag>");
    assert!(re2.find(text).unwrap().as_str() == "<tag>");
}

Possessive Quantifiers

Prevent backtracking (useful for performance):

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // a*+ doesn't backtrack
    let re = FuzzyRegex::new("a*+b").unwrap();
    println!("{}", re.is_match("ab"));
}

Quantifiers with Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "ab" with up to 1 error, repeated 1-3 times
    let re = FuzzyRegex::new("(?:ab){e<=1}{1,3}").unwrap();
    println!("{}", re.is_match("ab"));
}

Groups and Alternation

Group patterns and match alternatives.

Capture Groups

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?<user>\w+)@(?<domain>\w+\.\w+)").unwrap();
    let caps = re.captures("john@example.com").unwrap();

    assert_eq!(caps.name("user").unwrap().as_str(), "john");
    assert_eq!(caps.name("domain").unwrap().as_str(), "example.com");
}

Numbered Groups

fn main() {
    let re = FuzzyRegex::new(r"(\w+)@(\w+)").unwrap();
    let caps = re.captures("test@example").unwrap();

    assert_eq!(caps.get(1).unwrap().as_str(), "test");
    assert_eq!(caps.get(2).unwrap().as_str(), "example");
}

Non-Capturing Groups

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // (?:...) doesn't create a capture group
    let re = FuzzyRegex::new("(?:http|https)://").unwrap();
    println!("{}", re.is_match("http://example.com"));
}

Alternation

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match either option
    let re = FuzzyRegex::new("(foo|bar)").unwrap();

    assert!(re.is_match("foo"));
    assert!(re.is_match("bar"));
    assert!(!re.is_match("baz"));
}

Nested Groups

fn main() {
    let re = FuzzyRegex::new(r"((ab)(cd))").unwrap();
    let caps = re.captures("abcd").unwrap();

    assert_eq!(caps.get(0).unwrap().as_str(), "abcd"); // Full match
    assert_eq!(caps.get(1).unwrap().as_str(), "abcd"); // Outer group
    assert_eq!(caps.get(2).unwrap().as_str(), "ab");   // First inner
    assert_eq!(caps.get(3).unwrap().as_str(), "cd");   // Second inner
}

Fuzzy Groups

Apply fuzziness to groups:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("((?:hello){e<=1})").unwrap();
    println!("{}", re.is_match("helo"));
}

Shared Edit Budgets in Non-Capturing Groups

When a non-capturing group has its own fuzziness, the edit budget is shared across all pieces inside:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Shared budget: all pieces inside draw from the same {e<=2} pool
    let re = FuzzyRegex::new("(?:(?:hello) (?:world)){e<=2}").unwrap();

    assert!(re.is_match("hello world"));   // 0 edits
    assert!(re.is_match("helo world"));    // 1 deletion (hello side)
    assert!(re.is_match("helo worl"));     // 1 deletion each (2 total, within budget)
    assert!(!re.is_match("hlo wrld"));     // 3 deletions (exceeds budget)
}

Per-type limits ({i<=N,d<=N,s<=N}) also work with shared budgets:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // At most 1 insertion AND 1 deletion across ALL pieces combined
    let re = FuzzyRegex::new("(?:hello world){i<=1,d<=1}").unwrap();

    assert!(re.is_match("helo world"));   // 1 deletion
    assert!(!re.is_match("helo worl"));   // 2 deletions (exceeds d<=1)
}

How It Works

Without a group budget, each (?:...){e<=N} has its own independent edit allowance. Wrapping them in a non-capturing group with fuzziness merges them into a shared pool:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Independent: each piece gets its own {e<=2} → 4 total possible
    let re1 = FuzzyRegex::new("(?:hello){e<=2} (?:world){e<=2}").unwrap();
    assert!(re1.is_match("hlo wrld"));  // 3 deletions across both

    // Shared: both pieces share one {e<=2} budget
    let re2 = FuzzyRegex::new("(?:(?:hello){e<=2} (?:world){e<=2}){e<=2}").unwrap();
    assert!(!re2.is_match("hlo wrld")); // 3 deletions exceeds shared budget
}

This is useful for constraining total fuzziness across a multi-word phrase or compound pattern.

Anchors and Boundaries

Match positions in the string.

String Anchors

Syntax	Description
`^`	Start of string
`$`	End of string

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match at start
    let re1 = FuzzyRegex::new("^hello").unwrap();
    assert!(re1.is_match("hello world"));
    assert!(!re1.is_match("say hello"));

    // Match at end
    let re2 = FuzzyRegex::new("hello$").unwrap();
    assert!(re2.is_match("say hello"));
    assert!(!re2.is_match("hello world"));

    // Match entire string
    let re3 = FuzzyRegex::new("^hello$").unwrap();
    assert!(re3.is_match("hello"));
    assert!(!re3.is_match("hello world"));
}

Word Boundaries

Syntax	Description
`\b`	Word boundary
`\B`	Non-word boundary

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match whole word "cat"
    let re1 = FuzzyRegex::new(r"\bcat\b").unwrap();
    assert!(re1.is_match("cat"));
    assert!(re1.is_match("the cat sat"));
    assert!(!re1.is_match("category"));

    // Match "cat" not at word boundary
    let re2 = FuzzyRegex::new(r"\Bcat\B").unwrap();
    assert!(re2.is_match("category"));
    assert!(!re2.is_match("cat"));
}

Multi-line Mode

With (?m), ^ and $ match line boundaries:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?m)^hello$").unwrap();
    let text = "hello\nhello\nhello";
    let matches: Vec<_> = re.find_iter(text).collect();
    assert_eq!(matches.len(), 3);
}

Lookahead and Lookbehind

Assert what comes before or after the match without including it.

Lookahead

Positive Lookahead `(?=...)`

Match only if followed by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" only if followed by " world"
    let re = FuzzyRegex::new("hello(?= world)").unwrap();

    assert!(re.is_match("hello world"));  // Matches "hello"
    assert!(!re.is_match("hello there")); // No match
}

Negative Lookahead `(?!...)`

Match only if NOT followed by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" only if NOT followed by " world"
    let re = FuzzyRegex::new("hello(?! world)").unwrap();

    assert!(re.is_match("hello there")); // Matches "hello"
    assert!(!re.is_match("hello world")); // No match
}

Lookbehind

Positive Lookbehind `(?<=...)`

Match only if preceded by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "world" only if preceded by "hello "
    let re = FuzzyRegex::new("(?<=hello )world").unwrap();

    assert!(re.is_match("hello world"));
    assert!(!re.is_match("bye world"));

    // Get match position
    let m = re.find("say hello world here").unwrap();
    assert_eq!(m.start(), 9); // After "hello "
}

Negative Lookbehind `(?<!...)`

Match only if NOT preceded by:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "world" only if NOT preceded by "hello "
    let re = FuzzyRegex::new("(?<!hello )world").unwrap();

    assert!(re.is_match("bye world"));
    assert!(!re.is_match("hello world"));
}

Fuzzy Lookbehind

Lookbehind can include fuzzy matching:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "world" preceded by "hello" with up to 1 error
    let re = FuzzyRegex::new("(?<=(?:hello){e<=1})world").unwrap();

    assert!(re.is_match("hello world")); // Exact
    assert!(re.is_match("hallo world")); // 1 substitution
    assert!(!re.is_match("goodbye world")); // Too different
}

Fuzzy Lookbehind Examples

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Allow insertions in lookbehind
    let re = FuzzyRegex::new("(?<=(?:hello){i<=2})world").unwrap();
    assert!(re.is_match("helllo world")); // 2 insertions (ll)
    assert!(re.is_match("hellllo world")); // 3 insertions - no match

    // Allow deletions in lookbehind
    let re = FuzzyRegex::new("(?<=(?:hello){d<=1})world").unwrap();
    assert!(re.is_match("helo world")); // 1 deletion (one 'l')
    assert!(!re.is_match("heo world")); // 2 deletions - no match

    // Combined: substitutions and insertions
    let re = FuzzyRegex::new("(?<=(?:hello){s<=1,i<=1})world").unwrap();
    assert!(re.is_match("hallo world")); // 1 substitution
    assert!(re.is_match("helloh world")); // 1 insertion
    assert!(re.is_match("hallo world")); // matches, prefers lower edits
}

Case Insensitive with Fuzzy

Combine case-insensitive matching with fuzzy for robust matching:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // Case-insensitive + fuzzy - case differences don't count as edits
    let re = FuzzyRegexBuilder::new("(?i)(?:hello){e<=1}")
        .build()
        .unwrap();

    assert!(re.is_match("hello")); // 0 edits
    assert!(re.is_match("HELLO")); // 0 edits (case difference is free!)
    assert!(re.is_match("hallo")); // 1 edit (actual substitution)

    // Builder's case_insensitive flag works the same way
    let re2 = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .case_insensitive(true)
        .build()
        .unwrap();

    assert!(re2.is_match("HELLO")); // 0 edits
    assert!(re2.is_match("HeLLo")); // 0 edits (all case differences)

    // Get edit count to verify
    let m = re.find("HELLO").unwrap();
    assert_eq!(m.total_edits(), 0); // No edits charged for case!
}

Fuzzy Lookahead

Fuzzy matching also works with lookahead:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Match "hello" followed by "world" with up to 1 error
    let re = FuzzyRegex::new("hello(?= {e<=1}world)").unwrap();

    assert!(re.is_match("hello world")); // Exact
    assert!(!re.is_match("hello world!")); // ! is not space, lookahead fails
}

Variable-Length Lookbehind

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Variable-length lookbehind (alternation)
    let re = FuzzyRegex::new("(?<=(?:hello|hi) )world").unwrap();
    println!("{}", re.is_match("hi world"));
}

Atomic Groups and Possessive Quantifiers

Advanced features for performance and pattern control.

Atomic Groups `(?>...)`

Atomic groups prevent backtracking once matched:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Without atomic group: can match "abc" then backtrack
    let re1 = FuzzyRegex::new("(?:ab|abc)").unwrap();

    // Atomic: once "abc" matches, no backtracking
    let re2 = FuzzyRegex::new("(?>ab|abc)").unwrap();
    
    println!("re1: {}", re1.is_match("abc"));
    println!("re2: {}", re2.is_match("abc"));
}

Possessive Quantifiers

Like atomic groups for quantifiers:

Syntax	Description
`*+`	Possessive zero-or-more
`++`	Possessive one-or-more
`?+`	Possessive zero-or-one

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Normal: backtracks to find match
    let re1 = FuzzyRegex::new("a*b").unwrap();

    // Possessive: doesn't backtrack
    let re2 = FuzzyRegex::new("a*+b").unwrap();
    
    println!("re1: {}", re1.is_match("ab"));
    println!("re2: {}", re2.is_match("ab"));
}

When to Use

Atomic groups and possessive quantifiers are useful when:

Performance: Prevent exponential backtracking
Determinism: Control match behavior explicitly
Greedy matching: When you want maximum match without backtracking

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Pattern that could cause backtracking issues:
    // Without possessive: "aaaaaaaa" + "a*" + "b" can be slow
    let re = FuzzyRegex::new("(?:a++)b").unwrap();
    println!("{}", re.is_match("ab"));
}

Match Reset `\K`

Reset the start of the match:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // \K resets match start - keeps "prefix" out of match
    let re = FuzzyRegex::new(r"prefix\Kworld").unwrap();
    let m = re.find("prefixworld").unwrap();
    assert_eq!(m.as_str(), "world");
    assert_eq!(m.start(), 6); // Starts at "world", not "prefix"
}

API Reference

Overview of the main API types and methods.

Main Types

Type	Description
`FuzzyRegex`	Compiled regex pattern
`FuzzyRegexBuilder`	Builder for customizing regex
`Match`	A single match result
`Captures`	Match with capture groups
`StreamingMatcher`	Stateful matcher for streaming
`StreamingMatch`	Match from streaming search

Quick Method Reference

#![allow(unused)]
fn main() {
use fuzzy_regex::FuzzyRegex;

// Construction
let re = FuzzyRegex::new("pattern").unwrap();

// Searching
re.is_match(text)           // bool - does it match?
re.find(text)               // Option<Match>
re.find_iter(text)          // Iterator<Match>
re.find_rev(text)           // Option<Match> - rightmost
re.captures(text)           // Option<Captures>

// Replacing
re.replace(text, "x")       // String
re.replace_all(text, "x")   // String
re.split(text)              // Split

// Streaming
re.stream()                 // StreamingMatcher
re.find_bytes(bytes)        // Option<StreamingMatch>
}

FuzzyRegex

The main compiled regex type.

Construction

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Simple construction
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    // With error handling
    let re = match FuzzyRegex::new("(?:hello){e<=1}") {
        Ok(r) => r,
        Err(e) => panic!("Invalid pattern: {}", e),
    };
    
    println!("Created regex: {}", re.is_match("hello"));
}

Search Methods

`is_match`

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    println!("{}", re.is_match("hello")); // true
    println!("{}", re.is_match("hallo")); // true
    println!("{}", re.is_match("world")); // false
}

`find`

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    if let Some(m) = re.find("say hello world") {
        println!("Found '{}' at {}-{}", m.as_str(), m.start(), m.end());
        println!("Similarity: {:.2}", m.similarity());
    }
}

`find_iter`

fn main() {
    let re = FuzzyRegex::new(r"(?:\w+){e<=1}").unwrap();

    for m in re.find_iter("test tset tsat") {
        println!("Match: '{}'", m.as_str());
    }
}

`find_rev`

Find rightmost match:

fn main() {
    let re = FuzzyRegex::new("(?:foo)").unwrap();
    let m = re.find_rev("foo foo foo").unwrap();
    assert_eq!(m.start(), 8);
}

Replace Methods

fn main() {
    let re = FuzzyRegex::new("world").unwrap();

    // Replace first match
    let result = re.replace("hello world", "rust");
    assert_eq!(result, "hello rust");

    // Replace all matches
    let result = re.replace_all("hello world world", "rust");
    assert_eq!(result, "hello rust rust");
}

Other Methods

fn main() {
    let re = FuzzyRegex::new(r"(?<word>\w+)").unwrap();

    // Capture groups
    let caps = re.captures("hello").unwrap();
    let word = caps.name("word").unwrap();

    // Split
    let parts: Vec<_> = re.split("a1b2c").collect();

    // Check pattern info
    println!("Capture count: {}", re.capture_count());
}

FuzzyRegexBuilder

Builder for customizing regex construction.

Basic Usage

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello)")
        .build()
        .unwrap();
    
    println!("Created");
}

Builder Options

Similarity Threshold

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=2}")
        .similarity(0.8)  // Minimum similarity 0.0-1.0
        .build()
        .unwrap();
    
    println!("Created");
}

Case Insensitivity

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello)")
        .case_insensitive(true)
        .build()
        .unwrap();

    assert!(re.is_match("HELLO"));
    assert!(re.is_match("Hello"));
}

Multi-line Mode

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("^hello$")
        .multi_line(true)  // ^ and $ match line boundaries
        .build()
        .unwrap();
    
    println!("Created");
}

Dot-all Mode

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("a.b")
        .dot_all(true)  // . matches newlines
        .build()
        .unwrap();
    
    println!("Created");
}

Partial Matching

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)  // Matches at end of text are partial
        .build()
        .unwrap();

    let m = re.find("hello").unwrap();
    assert!(m.partial()); // Match reaches end of text
}

Timeout

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;
    use std::time::Duration;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=5}")
        .timeout(Duration::from_millis(100))
        .build()
        .unwrap();
    
    println!("Created");
}

Chaining Options

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .case_insensitive(true)
        .similarity(0.7)
        .partial(true)
        .multi_line(true)
        .build()
        .unwrap();
    
    println!("Created");
}

Match Results

Methods available on match results.

Basic Methods

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("say hello world").unwrap();

    // Get matched string
    println!("{}", m.as_str());           // &str: "hello"
    println!("{}", m.start());            // usize: 4 (byte position)
    println!("{}", m.end());              // usize: 9
    println!("{}", m.len());              // usize: 5
    println!("{}", m.is_empty());         // bool: false
}

Similarity

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("say hello world").unwrap();

    // Similarity score 0.0 - 1.0
    println!("{}", m.similarity());       // f32: 1.0 = exact, lower = more errors
}

Partial Match

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("say hello").unwrap();

    // Check if match reaches end of input (for partial matching)
    println!("{}", m.partial());         // bool: true if match at end
}

Match as Different Types

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let m = re.find("hello world").unwrap();

    // Get raw bytes
    let bytes = m.as_bytes();

    // Get range
    let range = m.range(); // Range<usize>
    println!("Range: {:?}", range);
}

Iterating Matches

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"\w+").unwrap();

    for m in re.find_iter("hello world") {
        println!("{}", m.as_str());
    }
}

Match Iteration with Positions

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:foo){e<=1}").unwrap();

    for m in re.find_iter("foo fxo fxoo") {
        println!("'{}' at [{}, {})", m.as_str(), m.start(), m.end());
    }
}

Streaming API

Process large data incrementally without loading into memory.

Creating a Stream

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();
    println!("Stream created");
}

Feeding Data

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    // Feed data in chunks
    for m in stream.feed(b"some hay and niddle here") {
        println!("Found '{}' at {}", m.as_str(), m.start());
    }
}

Position Tracking

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    stream.feed(b"hello");
    assert_eq!(stream.position(), 5);

    stream.feed(b" world");
    assert_eq!(stream.position(), 11);
}

Cross-Boundary Matches

Matches can span chunk boundaries:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    // First chunk - no match yet
    let _ = stream.feed(b"xxx ");

    // Second chunk - match spanning boundary
    let matches: Vec<_> = stream.feed(b"hello").collect();
    println!("Matches: {}", matches.len());
}

Resetting

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();

    stream.feed(b"hello world");
    stream.reset();

    assert_eq!(stream.position(), 0);
}

Finish

Check for remaining matches at end of stream:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let mut stream = re.stream();
    stream.feed(b"test hel");

    // Check remaining buffer
    let final_match = stream.finish();
}

Reader Integration

Process files or other readers:

fn main() {
    use fuzzy_regex::FuzzyRegex;
    use std::io::BufReader;
    use std::fs::File;

    let re = FuzzyRegex::new("(?:needle){e<=1}").unwrap();
    let file = File::open("Cargo.toml").unwrap();

    for m in re.stream().search_reader(BufReader::new(file)) {
        println!("Match at byte {}", m.start());
    }
}

Byte-Level API

Search byte slices directly:

fn main() {
    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    if let Some(m) = re.find_bytes(b"hello world") {
        println!("Found at {}-{}", m.start(), m.end());
    }

    // Iterate over bytes
    let matches: Vec<_> = re.find_iter_bytes(b"hello hullo").collect();
    println!("Matches: {}", matches.len());
}

Advanced Features

Additional capabilities beyond basic fuzzy matching.

In This Section

This section covers:

Capture Groups - Extract matched portions
Partial Matching - Match incomplete input
Word Lists - Named reference patterns
Custom Handlers - Invoke custom logic during matching
Compatibility Layer - Migrate from other libraries

Capture Groups

Extract specific portions of matches.

Named Capture Groups

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?<user>\w+)@(?<domain>\w+\.\w+)").unwrap();
    let caps = re.captures("john@example.com").unwrap();

    assert_eq!(caps.name("user").unwrap().as_str(), "john");
    assert_eq!(caps.name("domain").unwrap().as_str(), "example.com");
}

Numbered Capture Groups

fn main() {
    let re = FuzzyRegex::new(r"(\w+)@(\w+)").unwrap();
    let caps = re.captures("test@example").unwrap();

    assert_eq!(caps.get(1).unwrap().as_str(), "test");
    assert_eq!(caps.get(2).unwrap().as_str(), "example");
    assert_eq!(caps.get(0).unwrap().as_str(), "test@example"); // Full match
}

Nested Groups

fn main() {
    let re = FuzzyRegex::new(r"((?P<outer>\w+)-(?P<inner>\w+))").unwrap();
    let caps = re.captures("hello-world").unwrap();

    assert_eq!(caps.get(0).unwrap().as_str(), "hello-world");
    assert_eq!(caps.get(1).unwrap().as_str(), "hello-world");
    assert_eq!(caps.name("outer").unwrap().as_str(), "hello");
    assert_eq!(caps.name("inner").unwrap().as_str(), "world");
}

Capture Groups with Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?<word>\w+){e<=1}").unwrap();
    let caps = re.captures("tset").unwrap();

    assert_eq!(caps.name("word").unwrap().as_str(), "tset");
}

Fuzzy Edits Information

When using fuzzy matching, you can get detailed edit information:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"(?:hello){e<=2}").unwrap();
    let m = re.find("helllo").unwrap();

    // Get total edits
    println!("Total edits: {}", m.total_edits()); // 1 (1 insertion)

    // Get detailed counts: (insertions, deletions, substitutions)
    let (ins, del, sub) = m.fuzzy_counts();
    println!("Insertions: {}, Deletions: {}, Substitutions: {}", ins, del, sub);

    // Get positions of changes
    let (ins_pos, del_pos, sub_pos) = m.fuzzy_changes();
    println!("Insertion positions: {:?}", ins_pos);
}

Handler Overrides in Captures

When using handlers with MatchOverride, the overrides are tracked in captures:

fn main() {
    use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

    let re = FuzzyRegexBuilder::new(r"(prefix(?call:handler)suffix)")
        .handler("handler", |text, pos| {
            if text[pos..].starts_with("XYZ") {
                HandlerResult::MatchOverride(3, "xyz".to_string())
            } else {
                HandlerResult::NoMatch
            }
        })
        .build()
        .unwrap();

    let caps = re.captures("prefixXYZsuffix").unwrap();

    // The captured text shows the override
    assert_eq!(caps.get(1).unwrap().as_str(), "prefixxyzsuffix");

    // Handler overrides track (start_byte, end_byte, override_text)
    let overrides = caps.handler_overrides();
    println!("Handler overrides: {:?}", overrides);
    // Output: [(6, 9, "xyz")]
}

Multiple Capture Groups with Fuzzy

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Two fuzzy groups with different limits
    let re = FuzzyRegex::new(r"(?<a>\w+){e<=1} (?<b>\w+){e<=1}").unwrap();
    let caps = re.captures("helo wrold").unwrap();

    assert_eq!(caps.name("a").unwrap().as_str(), "helo"); // 1 substitution
    assert_eq!(caps.name("b").unwrap().as_str(), "wrold"); // 1 substitution

    // Each group tracks its own edits
    // Note: Fuzzy matching applies to the overall pattern
}

Iterating Captures

fn main() {
    let re = FuzzyRegex::new(r"(\w+)").unwrap();
    let caps = re.captures("hello world").unwrap();

    for cap in caps.iter() {
        if let Some(m) = cap {
            println!("Group: '{}'", m.as_str());
        }
    }
}

Accessing All Groups

fn main() {
    let re = FuzzyRegex::new(r"(\w+)@(\w+)").unwrap();
    let caps = re.captures("a@b").unwrap();

    assert_eq!(caps.len(), 3); // 0 (full) + 2 groups

    // Names of capture groups
    for name in caps.iter_names() {
        if let Some(n) = name {
            println!("Group name: {}", n);
        }
    }
}

Partial Matching

Match incomplete or streaming input.

Enabling Partial Matching

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)
        .build()
        .unwrap();
    
    println!("Created");
}

How It Works

When enabled, matches that reach the end of the input are marked as “partial”:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)
        .build()
        .unwrap();

    // Match at end of text - partial
    let m1 = re.find("hello").unwrap();
    assert!(m1.partial());

    // Match not at end - not partial
    let m2 = re.find("say hello").unwrap();
    assert!(!m2.partial());

    // Fuzzy match reaching end - also partial
    let m3 = re.find("hallo").unwrap();
    assert!(m3.partial());
}

Use Cases

Streaming Input

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:command){e<=1}")
        .partial(true)
        .build()
        .unwrap();

    let mut stream = re.stream();

    // Feed incomplete data
    for m in stream.feed(b"cmd") {
        if m.partial() {
            println!("Partial match: might be complete when more data arrives");
        }
    }
}

Progressive Matching

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(true)
        .build()
        .unwrap();

    // Check if more input needed
    let m = re.find("incomplete");
    if m.map(|m| m.partial()).unwrap_or(false) {
        // Get more input
    }
}

Without Partial

By default (partial(false)), partial matches behave the same as full matches:

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    let re = FuzzyRegexBuilder::new("(?:hello){e<=1}")
        .partial(false) // default
        .build()
        .unwrap();

    let m = re.find("hello").unwrap();
    assert!(!m.partial()); // Always false
}

Word Lists

Define named lists of patterns for reuse.

Basic Usage

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let mut re = FuzzyRegex::new(r"\L<keywords>").unwrap();

    // Set the word list
    re.set_word_list(&["hello", "world", "test"]).unwrap();

    assert!(re.is_match("hello"));
    assert!(re.is_match("world"));
    assert!(!re.is_match("foo"));
}

With Fuzzy Matching

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let mut re = FuzzyRegex::new(r"\L<keywords>{e<=1}").unwrap();
    re.set_word_list(&["hello", "world"]).unwrap();

    assert!(re.is_match("hallo")); // "hello" with 1 error
    assert!(re.is_match("wrold")); // "world" with 1 error
}

Multiple Word Lists

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new(r"\L<names>|\L<places>").unwrap();
    // Note: Multiple lists require builder pattern
    println!("Created");
}

Use Cases

Keyword matching: Match against a list of keywords
Name matching: Match against database of names
Dictionary lookup: Match words from a dictionary

Custom Handlers

Custom handlers allow you to invoke arbitrary Rust code during regex matching. This is useful for patterns that are difficult or impossible to express with regular regex alone.

Syntax

Use (?call:handler_name) in your pattern to invoke a handler:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new("(?call:myhandler)")
    .handler("myhandler", |text, pos| {
        // Your logic here
    })
    .build()
    .unwrap();
}

Handler Signature

Handlers have the signature:

#![allow(unused)]
fn main() {
Fn(&str, usize) -> HandlerResult
}

text: The entire input string being matched
pos: Current position in the text (byte index)
HandlerResult::MatchOverride(n, text): Handler matched, consume n bytes, optionally override captured text
HandlerResult::NoMatch: Handler didn’t match at this position

Important: The byte count must account for UTF-8 encoding. For multi-byte characters (like Cyrillic, emoji), use byte length, not character count.

Examples

Simple Matching

Match strings that start with a specific prefix:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(?call:prefix)")
    .handler("prefix", |text, pos| {
        if text[pos..].starts_with("foo") {
            HandlerResult::MatchOverride(3, "FOO".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let m = re.find("foobar").unwrap();
assert_eq!(m.as_str(), "foo");
}

Capture Override

Handlers can override the captured text while still matching the original:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(prefix(?call:handler)suffix)")
    .handler("handler", |text, pos| {
        if text[pos..].starts_with("XYZ") {
            // Match 3 bytes but capture as lowercase
            HandlerResult::MatchOverride(3, "xyz".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let caps = re.captures("prefixXYZsuffix").unwrap();
// Captured text is "xyz" (the override), not "XYZ"
assert_eq!(caps.get(1).unwrap().as_str(), "prefixxyzsuffix");
}

Matching Escaped Characters

Match strings with escaped quotes:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r#""((?call:unescape)|.)*""#)
    .handler("unescape", |text, pos| {
        if pos + 1 < text.len() && text.as_bytes()[pos] == b'\\' {
            HandlerResult::MatchOverride(2, String::new()) // consume \ and next char
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let m = re.find(r#""hello \"world\"""#).unwrap();
assert_eq!(m.as_str(), "\"hello \\\"world\\\"\"");

let caps = re.captures(r#""hello \"world\"""#).unwrap();
// The handler_overrides contain the override information
println!("{:?}", caps.handler_overrides());
}

Escape Sequences

Match common escape sequences like \n, \t, \\:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(?call:escape)+")
    .handler("escape", |text, pos| {
        if pos + 1 < text.len() && text.as_bytes()[pos] == b'\\' {
            let next = text.as_bytes()[pos + 1];
            if matches!(next, b'n' | b't' | b'r' | b'\\' | b'"' | b'\'') {
                return HandlerResult::MatchOverride(2, String::new());
            }
        }
        HandlerResult::NoMatch
    })
    .build()
    .unwrap();

re.find(r"hello\nworld"); // matches "\n"
re.find(r"a\tb");         // matches "\t"
}

Context-Sensitive Matching

Match patterns that depend on context:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

// Match "id:123" only when preceded by "user:"
let re = FuzzyRegexBuilder::new(r"user:(?call:id)")
    .handler("id", |text, pos| {
        if text[pos..].starts_with("id:") {
            let rest = &text[pos + 3..];
            let mut count = 0;
            for (_, c) in rest.char_indices() {
                if c.is_ascii_digit() {
                    count += 1;
                } else {
                    break;
                }
            }
            if count > 0 {
                return HandlerResult::MatchOverride(3 + count, String::new());
            }
        }
        HandlerResult::NoMatch
    })
    .build()
    .unwrap();

re.find("user:id:123"); // matches "user:id:123"
re.find("user:name:foo"); // no match (not "id:")
re.find("id:999"); // no match (no "user:" prefix)
}

Unicode Text Transformation

Handle non-ASCII text (note: byte count matters for UTF-8):

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r#""(?call:translate)""#)
    .handler("translate", |text, pos| {
        let remaining = &text[pos..];
        // "привет" is 6 Cyrillic chars = 12 bytes in UTF-8
        if remaining.starts_with("привет") {
            HandlerResult::MatchOverride(12, "HELLO".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let caps = re.captures("\"привет\"").unwrap();
// The captured text shows the override
assert_eq!(caps.get(0).unwrap().as_str(), "\"HELLO\"");

// Handler overrides track (start_byte, end_byte, override_text)
assert_eq!(caps.handler_overrides(), &[(1, 13, "HELLO")]);
}

Fuzzy Matching with Handlers

Combine handlers with fuzzy matching for powerful patterns:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

// Fuzzy lookbehind + handler for complex validation
let re = FuzzyRegexBuilder::new(r"(?<=(?:hello){e<=2}) (?call:translate)")
    .handler("translate", |text, pos| {
        let remaining = &text[pos..];
        // "привет" = 12 bytes
        if remaining.starts_with("привет") {
            HandlerResult::MatchOverride(12, "HELLO".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

// Matches with fuzzy lookbehind
let m = re.find("hello привет").unwrap();
assert_eq!(m.as_str(), " привет");

// Also matches with fuzzy lookbehind (1 extra 'l')
let m = re.find("helllo привет").unwrap();
assert_eq!(m.as_str(), " привет");
}

Multiple Handlers

Use multiple handlers in the same pattern:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

let re = FuzzyRegexBuilder::new(r"(?call:foo).*(?call:bar)")
    .handler("foo", |text, pos| {
        if text[pos..].starts_with("START") {
            HandlerResult::MatchOverride(5, "start".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .handler("bar", |text, pos| {
        if text[pos..].starts_with("END") {
            HandlerResult::MatchOverride(3, "end".to_string())
        } else {
            HandlerResult::NoMatch
        }
    })
    .build()
    .unwrap();

let caps = re.captures("START middle END").unwrap();
assert_eq!(caps.get(0).unwrap().as_str(), "START middle END");
}

Validation Handlers

Use handlers to implement complex validation rules:

#![allow(unused)]
fn main() {
use fuzzy_regex::{FuzzyRegexBuilder, HandlerResult};

// Match email-like patterns but validate the domain
let re = FuzzyRegexBuilder::new(r"\w+@(?call:domain)")
    .handler("domain", |text, pos| {
        // Find the end of the domain (until whitespace or end)
        let remaining = &text[pos..];
        let mut end = 0;
        for (i, c) in remaining.char_indices() {
            if c.is_whitespace() {
                break;
            }
            end = i + c.len_utf8();
        }
        if end > 0 {
            let domain = &remaining[..end];
            // Only allow .com, .org, .net domains
            if domain.ends_with(".com") || domain.ends_with(".org") || domain.ends_with(".net") {
                return HandlerResult::MatchOverride(end, String::new());
            }
        }
        HandlerResult::NoMatch
    })
    .build()
    .unwrap();

re.find("user@example.com"); // matches
re.find("user@test.org");   // matches
re.find("user@fake.io");    // no match (invalid TLD)
}

Performance Notes

Handlers are called during NFA simulation, which may be slower than optimized paths
For best performance, keep handler logic simple
Consider using handlers only when necessary; standard regex is faster for expressible patterns
When using MatchOverride, the override text is stored separately and applied during capture construction

Limitations

Handlers cannot perform lookahead/lookbehind themselves
Handler matches are exact (not fuzzy) - they either match or don’t
Captures inside handlers are limited
Position is in bytes, not characters - account for UTF-8 encoding

Compatibility Layer

Migrate from other fuzzy matching libraries.

fuzzy-aho-corasick

This is a drop-in replacement for fuzzy-aho-corasick:

#![allow(unused)]
fn main() {
use fuzzy_regex::compat::fac::FuzzyAhoCorasickBuilder;
use fuzzy_regex::types::FuzzyLimits;

let searcher = FuzzyAhoCorasickBuilder::new()
    .fuzzy(FuzzyLimits::new().edits(1))
    .build(["hello", "world"])
    .unwrap();

for m in searcher.find_iter("helo wrld") {
    println!("Pattern {} matched at {}", m.pattern_index(), m.start());
}
}

Architecture Overview

How fuzzy-regex works internally.

High-Level Architecture

┌─────────────────────────────────────────────┐
│              FuzzyRegex                      │
│         (Public API Layer)                   │
│                                              │
│  ┌────────────────────────────────────────┐  │
│  │        Fast Path Optimizations         │  │
│  │  • Pure greedy .* → instant match     │  │
│  │  • .*SUFFIX → reverse search          │  │
│  │  • Word boundaries → direct search    │  │
│  └────────────────────────────────────────┘  │
└──────────────────┬──────────────────────────┘
                  │
         ┌────────┴──────────┐
         ▼                     ▼
┌───────────────┐   ┌─────────────────┐
│     DFA       │   │   NFA Engine   │
│  (Fast Path)  │   │  (Full Engine) │
│ + Capturing   │   │                │
│    Groups     │   │                │
└───────┬───────┘   └────────┬────────┘
        │                    │
        └────────┬───────────┘
                 ▼
        ┌─────────────────┐
        │  Fuzzy Bridge  │
        └────────┬────────┘
                 │
      ┌───────────┴───────────┐
      ▼                       ▼
┌──────────┐         ┌──────────┐
│  Bitap   │         │   D-L    │
│ Matcher  │         │   NFA    │
└──────────┘         └──────────┘

Components

1. Parser

Tokenizes regex patterns
Builds Abstract Syntax Tree (AST)
Handles fuzzy syntax extensions

2. Compiler

Converts AST to Intermediate Representation (IR)
Optimizes patterns
Handles fuzzy matching compilation

3. Fast Path Optimizations

Before going to the main engines, FuzzyRegex checks for special patterns:

Pure greedy dot-star: .*, ^.*$, .*$ return instantly
Greedy prefix with suffix: .*test uses reverse search (O(n) vs O(n²))
Word boundaries: Direct literal search with boundary checks
DFA compatibility: Patterns with capturing groups now use DFA

4. Matching Engines

DFA (Deterministic Finite Automaton)

Fast path for exact/non-fuzzy patterns
No backtracking
Supports capturing groups
Limited feature support

NFA (Non-deterministic Finite Automaton)

Full regex feature support
Backtracking engine
Supports fuzzy matching

Bitap Algorithm

Optimized for short patterns (≤64 chars)
O(n×k) time complexity
SIMD-accelerated

Damerau-Levenshtein NFA

Fuzzy matching via automata
Supports all edit types
Used for longer patterns

5. Fuzzy Bridge

Connects NFA to fuzzy matchers
Extracts literals for pre-filtering
Coordinates multiple matchers

Bitap Algorithm

Fast fuzzy matching algorithm for short patterns.

Overview

The Bitap algorithm (also known as the Shift-Or algorithm) is used for fuzzy matching of short patterns (≤64 characters). It achieves O(n×k) time complexity where n is text length and k is maximum edits.

How It Works

Bitap uses bitmasks to track which positions can match:

Initialize: Create a bitmask for each character in the pattern
Scan: For each text character, update the bitmask
Match: When bitmask indicates a match, extract the match

Bitap State

#![allow(unused)]
fn main() {
// Simplified bitap state representation
struct BitapState {
    // Bits representing current match positions
    // For pattern "hello":
    // Position:    5   4   3   2   1   0
    // Bits:        0   0   0   0   0   1  = "h"
    //              0   0   0   0   1   0  = "e"
    //              ... continue ...
}
}

Key Properties

Pattern Length Limit

Maximum 64 characters (uses 64-bit integers)
Longer patterns use Damerau-Levenshtein NFA

Edit Distance

Supports insertions, deletions, substitutions
Transpositions handled specially

SIMD Acceleration

Modern implementations use SIMD for parallelism:

#![allow(unused)]
fn main() {
// SIMD processes multiple characters at once
// 256-bit AVX2: process 32 bytes per iteration
// 512-bit AVX512: process 64 bytes per iteration
}

Performance

Pattern Length	Text Size	Throughput
14 chars	200KB	~1.5 Gbps
14 chars	1MB	~1.4 Gbps
14 chars, k=1	200KB	~2.0 Gbps
14 chars, k=3	200KB	~700 Mbps

When Bitap is Used

Pattern length ≤ 64 characters
Simple fuzzy matching (no complex regex features)
Exact matching benefits from optimizations

Implementation Details

State Machine

#![allow(unused)]
fn main() {
// Each iteration:
// 1. Shift state right by 1
// 2. OR with pattern mask for current char
// 3. Check for match (zero in certain bits)
}

Handling Errors

Bitap extends to handle errors by maintaining multiple state masks (one per allowed error):

#![allow(unused)]
fn main() {
// For k errors, maintain k+1 states
// state[0] = exact match
// state[1] = 1 error
// state[k] = k errors
}

Damerau-Levenshtein NFA

Fuzzy matching via finite automata for longer patterns.

Overview

For patterns longer than 64 characters or when complex regex features are needed, fuzzy-regex uses a Damerau-Levenshtein NFA. This is an extension of the classic Damerau-Levenshtein automaton.

Damerau-Levenshtein Automaton

The Damerau-Levenshtein automaton for a pattern P and max distance k accepts all strings within k edits of P.

State Representation

#![allow(unused)]
fn main() {
// Each state represents a position in the pattern
// plus the number of errors used so far
struct NfaState {
    pattern_position: usize,  // Position in pattern
    errors_used: usize,       // Errors consumed
}
}

Transitions

For each input character, the NFA can transition to:

Match: Next character in pattern (0 errors)
Insertion: Stay at current position (1 error)
Deletion: Skip pattern character (1 error)
Substitution: Move to next with 1 error
Transposition: Swap next two characters (special case)

Building the NFA

#![allow(unused)]
fn main() {
// Pattern: "abc" with k=1
// States: (0,0) → (1,0) → (2,0) → (3,0)  [exact path]
//         (0,0) → (1,1) → ...             [with errors]
}

Construction Algorithm

Create initial state at position 0
For each pattern character, create states for:
- Exact match (no error)
- Insertion (adds error)
- Deletion (adds error)
- Substitution (adds error)
Connect states with transitions
Mark accepting states

Performance Characteristics

Aspect	Damerau-Levenshtein NFA
Time Complexity	O(n × k × m)
Space Complexity	O(k × m)
Pattern Length	Unlimited
Features	Full regex

Where:

n = text length
k = max edits
m = pattern length

Comparison with Bitap

Feature	Bitap	Damerau-Levenshtein NFA
Max Pattern Length	64	Unlimited
Time Complexity	O(n×k)	O(n×k×m)
SIMD Support	Yes	Limited
Regex Features	Limited	Full
Memory Usage	Low	Higher

When Used

Pattern length > 64 characters
Complex regex features (lookahead, backreferences)
Alternation with fuzzy segments

Fuzzy Bridge

Connecting regex engine with fuzzy matching components.

Purpose

The Fuzzy Bridge coordinates between the NFA-based regex engine and specialized fuzzy matchers (Bitap, Damerau-Levenshtein NFA).

Responsibilities

Literal Extraction: Extract exact-match literals from patterns for pre-filtering
Search Coordination: Order and combine multiple matchers
Result Merging: Combine and rank results from different matchers

Architecture

Regex Pattern:  hello (?:world){e<=1}
                    │
                    ▼
           ┌────────────────┐
           │ Fuzzy Bridge   │
           └───────┬────────┘
                   │
       ┌───────────┴───────────┐
       ▼                       ▼
┌──────────────┐      ┌──────────────┐
│    DFA       │      │   Fuzzy      │
│  (exact)     │      │   Matcher    │
│  "hello"     │      │ (Bitap/NFA)  │
└──────────────┘      │ "world"{e<=1}│
                      └──────────────┘

Literal Extraction

#![allow(unused)]
fn main() {
// Pattern: "prefix (?:target){e<=1} suffix"
// Extracted literals: ["prefix", "suffix"]
}

Extraction Strategy

Identify fuzzy segments in pattern
Extract surrounding exact literals
Use literals for:
- Pre-filtering text positions
- Quick rejection of non-matching areas

Search Flow

Simple Fuzzy Pattern

For (?:hello){e<=1}:

Use Bitap directly (pattern ≤64 chars)
Or build Damerau-Levenshtein NFA (longer patterns)

Mixed Pattern

For exact (?:fuzzy){e<=1}:

Use DFA to find “exact” quickly
For each “exact” match, check surrounding for fuzzy match

Complex Pattern

For (?:a){e<=1}(?:b){e<=1}:

Find all fuzzy matches for “a”
Find all fuzzy matches for “b”
Check if they can be combined

Optimization Techniques

Prefiltering

Use extracted literals to quickly skip non-matching regions:

#![allow(unused)]
fn main() {
// Text: "blah blah prefix target suffix blah"
// Literal "prefix" found at position X
// Only check fuzzy "target" around position X
}

Early Termination

Stop searching when match found (for find/first):

#![allow(unused)]
fn main() {
// First match found, don't continue
}

Caching

Cache compiled matchers for reuse:

#![allow(unused)]
fn main() {
// Bitap matcher cached for pattern
}

DFA Optimization

Fast path for exact and non-fuzzy patterns.

Deterministic Finite Automaton

The DFA provides the fastest matching path for patterns that don’t require fuzzy matching.

When DFA is Used

Exact match patterns (no fuzziness)
Simple character classes
Quantifiers without fuzzy segments

DFA Properties

Property	Value
Time Complexity	O(n)
Space Complexity	O(states)
Backtracking	None
Memory	Deterministic

Implementation

State Machine

#![allow(unused)]
fn main() {
// DFA state contains:
// - Transitions for each possible input
// - Whether state is accepting
// - Associated match information
}

Simulation

#![allow(unused)]
fn main() {
// For each input character:
// 1. Look up next state from current state
// 2. If no transition, stop (no match)
// 3. If accepting state, record match position
// 4. Continue until end of input
}

Fast Path Integration

Pattern: hello{e<=1}
              │
              ▼
    ┌─────────────────┐
    │ Has Fuzziness?  │
    └────────┬────────┘
             │
      ┌──────┴──────┐
      ▼             ▼
   Yes            No
     │             │
     ▼             ▼
   NFA/          DFA
   Bitap

Limitations

The DFA cannot handle:

Fuzzy matching
Lookahead/lookbehind
Backreferences
Capture groups (for position info)

For these, the NFA engine is used.

Optimization Techniques

State Minimization

Reduce number of states:

#![allow(unused)]
fn main() {
// Before: 100 states
// After minimization: 50 states
}

Lazy Evaluation

Build DFA states on-demand:

#![allow(unused)]
fn main() {
// Only build states that are actually traversed
}

Caching

Cache DFA for reuse:

#![allow(unused)]
fn main() {
// Reuse compiled DFA across searches
}

All-Matches Algorithms

The DFA supports multiple algorithms for finding all matches.

Three-Pass (Default)

The standard approach: find a match, advance, repeat.

#![allow(unused)]
fn main() {
let matches = dfa.find_all(text);
// Time: O(n²) worst case for pathological patterns
}

Two-Pass Algorithm

Uses a reverse prefilter to find candidates efficiently:

Pass 1: Scan right-to-left, finding candidate positions
Pass 2: Verify matches at each candidate

#![allow(unused)]
fn main() {
let matches = dfa.find_all_two_pass(text);
// Prefilter reduces candidates in Pass 1
// Still O(n²) worst case in Pass 2
}

Hardened Mode (O(n))

Tracks all active DFA states simultaneously:

Maintain set of (state, start_position) pairs
Process each character once, updating all states
When no continuation possible, emit match
Continue from next position

#![allow(unused)]
fn main() {
let matches = dfa.find_all_hardened(text);
// Guaranteed O(n) - each character processed once
}

Prefilter Types

Type	Description
SingleByte	`memrchr` for one byte
TwoBytes	`memrchr` for two bytes
ThreeBytes	`memrchr` for three bytes
Teddy	SIMD-accelerated multi-byte

When to Use

Scenario	Recommended
Few matches, well-behaved pattern	Default
Prefilter effective	Two-pass
Pathological patterns	Hardened

Streaming Implementation

Processing data incrementally without loading everything into memory.

Streaming Architecture

Input Data
    │
    ▼
┌──────────────────┐
│  StreamingMatcher │
│  ┌──────────────┐ │
│  │ Input Buffer │ │
│  └──────────────┘ │
│         │         │
│         ▼         │
│  ┌──────────────┐ │
│  │   Matcher    │ │
│  └──────────────┘ │
└────────┬─────────┘
         │
         ▼
    Match Results

Key Components

Input Buffer

Stores unprocessed data between feed calls:

#![allow(unused)]
fn main() {
struct StreamingMatcher {
    buffer: Vec<u8>,      // Unprocessed bytes
    position: usize,      // Current position in stream
    matcher: Matcher,     // Underlying matcher
}
}

Position Tracking

Tracks absolute position in the stream:

#![allow(unused)]
fn main() {
let mut stream = re.stream();

stream.feed(b"hello");      // position = 5
stream.feed(b" world");     // position = 11

assert_eq!(stream.position(), 11);
}

Cross-Boundary Matching

Matches can span chunk boundaries:

#![allow(unused)]
fn main() {
let mut stream = re.stream();

// Chunk 1: no complete match
stream.feed(b"xxx ");  

// Chunk 2: completes the match
let matches = stream.feed(b"hello").collect();
// Match found spanning from chunk 1
}

Buffer Management

#![allow(unused)]
fn main() {
// Keep some overlap between chunks
// for cross-boundary matches

let mut stream = re.stream();

// Strategy: Keep last (pattern_len + max_errors) bytes
// This ensures cross-boundary matches are found
}

Reader Integration

Process any Read source:

#![allow(unused)]
fn main() {
use std::io::BufReader;
use std::fs::File;

let file = File::open("large_file.txt")?;
let reader = BufReader::new(file);

for m in re.stream().with_chunk_size(8192).search_reader(reader) {
    println!("Match at {}", m.start());
}
}

Byte-Level Streaming

Feed

#![allow(unused)]
fn main() {
stream.feed(b"data chunk");
}

Finish

#![allow(unused)]
fn main() {
// After all data fed, check for remaining matches
stream.finish();
}

Reset

#![allow(unused)]
fn main() {
// Start over with fresh state
stream.reset();
}

Performance Considerations Size

Chunk

Chunk Size	Use Case
Small (1KB)	Low latency, interactive
Medium (8KB)	Balanced
Large (64KB+)	High throughput

Memory Usage

Streaming uses O(max_pattern_length + buffer) memory regardless of input size.

Implementation Details

State Management

#![allow(unused)]
fn main() {
// Between feed calls:
// 1. Save current state
// 2. Process new chunk
// 3. Keep partial state for next call

struct State {
    nfa_state: Option<NfaState>,
    bitap_state: u64,
    // etc.
}
}

Partial Results

Report matches as they’re found:

#![allow(unused)]
fn main() {
// Yield matches immediately as they're found
// Don't wait for complete input
}

Algorithm Selection

How fuzzy-regex chooses the matching algorithm.

Decision Tree

Pattern Analysis
      │
      ▼
┌──────────────────┐
│ Is exact match?  │
└────────┬─────────┘
          │
      ┌────┴────┐
      ▼         ▼
     Yes       No
       │         │
       ▼         ▼
    DFA    ┌────────────────┐
           │ Pure greedy .*? │
           └────────┬────────┘
                    │
               ┌────┴────┐
               ▼         ▼
              Yes       No
               │         │
               ▼         ▼
         Instant      ┌────────────────┐
          Match      │ Pattern ≤64?   │
                     └────────┬────────┘
                              │
                         ┌────┴────┐
                         ▼         ▼
                        Yes       No
                         │         │
                         ▼         ▼
                       Bitap    Damerau-Levenshtein
                                  NFA

Algorithm Comparison

Algorithm	Use Case	Complexity	Features
DFA	Exact patterns, capturing groups	O(n)	Limited
Instant Match	Pure greedy `.*`	O(1)	Limited
Bitap	Short fuzzy (≤64 chars)	O(n×k)	Most
Damerau-Levenshtein NFA	Long fuzzy patterns	O(n×k×m)	Full

Automatic Selection

The library automatically selects based on:

Pattern length: Bitap for ≤64 chars
Fuzzy complexity: Cost-based vs simple
Regex features: DFA can’t do lookahead
Streaming: Different code path
Greedy patterns: Instant match for .*, ^.*$, .*$
Greedy suffix patterns: .*SUFFIX uses reverse search

Special Optimizations

Pure Greedy Dot-Star

Patterns like .*, ^.*$, .*$ always match. The engine detects these and returns instantly without scanning:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*").unwrap();
// Returns match immediately - no text scanning
}

Greedy Prefix with Suffix

Patterns like .*test or .*test~2 use reverse search to avoid O(n²) behavior:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new(".*test").unwrap();
// Finds "test" from the right, then .* matches everything before it
// O(n) instead of O(n²)
}

DFA with Capturing Groups

DFA now works with capturing groups like (?m)^(.*)test$:

#![allow(unused)]
fn main() {
let re = FuzzyRegex::new("(?m)^(.*)test$").unwrap();
// Uses DFA - much faster than NFA
}

All-Matches Algorithms

When finding all matches in text (find_iter, find_all), fuzzy-regex supports multiple algorithms optimized for different patterns.

The Problem

Some patterns produce many overlapping matches, causing naive “find, advance, repeat” to be O(n²):

#![allow(unused)]
fn main() {
// Pattern: .*a|b on text: "bbbbbbbb"
// Each 'b' matches individually → O(n) matches × O(n) scan = O(n²)
}

Available Algorithms

Algorithm	Complexity	Best For
`find_all`	O(n²) worst case	Simple patterns, few matches
`find_all_two_pass`	O(n²) worst case	When prefilter is effective
`find_all_hardened`	O(n) guaranteed	Pathological patterns

Two-Pass Algorithm

Uses a reverse prefilter to find candidate positions, then verifies matches:

Pass 1: Scan right-to-left using prefilter (memchr, Teddy, etc.)
Pass 2: For each candidate, find the match

#![allow(unused)]
fn main() {
let mut dfa = Dfa::new(".*a|b").unwrap();
let matches = dfa.find_all_two_pass("bbbbbb");
}

Hardened Mode (O(n))

Tracks all active DFA states simultaneously as we scan left-to-right:

#![allow(unused)]
fn main() {
let mut dfa = Dfa::new(".*a|b").unwrap();
let matches = dfa.find_all_hardened("bbbbbb");
}

How it works:

Start with one state at current position
Process all characters, tracking active states
When no state can continue, emit the leftmost match
Move to next position, repeat

Benchmark: Pathological Pattern

Pattern .*a|b on text of all ’b’s (worst case):

Text Size	find_all	two_pass	hardened	Speedup
1,000 bytes	1.08s	1.10s	69ms	15x
5,000 bytes	5.47s	5.43s	69ms	79x
10,000 bytes	10.76s	10.85s	69ms	156x

When to Use Each

#![allow(unused)]
fn main() {
// Default: smart selection based on pattern
let matches = dfa.find_all(text);

// Explicit two-pass: good when prefilter is effective
let matches = dfa.find_all_two_pass(text);

// Explicit hardened: critical for pathological patterns
let matches = dfa.find_all_hardened(text);
}

Prefilter Types

The two-pass algorithm uses prefilters optimized for different patterns:

Prefilter	Use Case	Method
SingleByte	Single literal	`memrchr`
TwoBytes	Two literals	`memrchr` for both
ThreeBytes	Three literals	`memrchr` for all
Teddy	Many alternatives	SIMD range matching

Manual Override

Not currently exposed, but internal selection can be inspected:

#![allow(unused)]
fn main() {
// Check which engine was used
let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
if re.supports_streaming() {
    // Using Bitap
}
}

SIMD Optimizations

Using vector instructions for faster matching.

SIMD Overview

SIMD (Single Instruction, Multiple Data) allows processing multiple bytes simultaneously.

SIMD Width	Bytes/Iteration	Architecture
128-bit	16 bytes	SSE4.1, NEON
256-bit	32 bytes	AVX2
512-bit	64 bytes	AVX512

SIMD Components

Character Range Search (RevSearchRanges)

Vectorized reverse search for character ranges like [a-z]:

x86_64: Uses SSE/AVX2 for 16/32-byte iteration
ARM (NEON): Uses 16-byte vectors with vmaxvq_u8
Uses neon_movemask to extract bit positions

#![allow(unused)]
fn main() {
// Searches right-to-left for characters in range
let positions = rev_search_ranges.find_last(haystack);
}

Teddy Search

SIMD-accelerated multi-pattern searching:

Processes 16 bytes per iteration (32 on AVX2)
Checks 4 character ranges simultaneously
Uses bitwise operations to combine matches
Efficient for patterns with multiple byte alternatives

#![allow(unused)]
fn main() {
// Searches for any of multiple bytes
let position = teddy.find_first(haystack);
}

Bitap SIMD

The Bitap algorithm is highly parallelizable with SIMD:

#![allow(unused)]
fn main() {
// Without SIMD: 1 byte per iteration
// With AVX2: 32 bytes per iteration
// ~32x speedup potential
}

Implementation

#![allow(unused)]
fn main() {
// AVX2 implementation
unsafe {
    // Load 32 bytes
    let data = _mm256_loadu_si256(ptr.as_ptr());
    
    // Apply bitap operations to all 32 simultaneously
    // ...
}
}

Performance Impact

Configuration	Throughput
Scalar (no SIMD)	~50 MB/s
SIMD (AVX2)	~180 MB/s
Speedup	~3.6x

Enabling SIMD

SIMD is enabled by default. To disable:

[dependencies]
fuzzy-regex = { version = "0.1", default-features = false }

Platform Support

x86_64

SSE4.1: Minimum for SIMD path
AVX2: Default, 32-byte vectors
AVX512: Optional, 64-byte vectors

ARM (Apple Silicon)

NEON: 16-byte vectors
Auto-detected: Runtime detection for M1/M2/M3

Detection

Runtime detection automatically uses best available:

#![allow(unused)]
fn main() {
// At compile time: #[target_feature(enable = "avx2")]
// At runtime: CPU feature detection
}

Requirements

SIMD requires:

CPU support (SSE4.1+ or NEON)
Compiler with SIMD intrinsics
Pattern length suitable for Bitap

Benchmark: Character Class Search

Pattern [a-z]+ on 100KB text:

Implementation	Time
Scalar	45.2 μs
NEON	6.8 μs
AVX2	4.1 μs
Speedup	6-10x

Performance Tips

Optimize fuzzy-regex for your use case.

Pattern Design

1. Use Specific Edit Limits

fn main() {
    // Good: Specific limit
    let _ = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=1}").unwrap();

    // Less efficient: Higher limit
    let _ = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=5}").unwrap();
    
    println!("Done");
}

Lower edit limits = faster matching.

2. Prefer Shorter Patterns

fn main() {
    // Bitap (fast): ≤64 chars
    let _ = fuzzy_regex::FuzzyRegex::new("(?:short){e<=1}").unwrap();

    // NFA (slower): >64 chars
    let _ = fuzzy_regex::FuzzyRegex::new("(?:very_long_pattern_that_exceeds_sixty_four_characters){e<=1}").unwrap();
    
    println!("Done");
}

3. Extract Exact Parts

fn main() {
    // Good: Exact prefix and suffix help prefilter
    let _ = fuzzy_regex::FuzzyRegex::new("exact_prefix (?:fuzzy){e<=1} exact_suffix").unwrap();

    // Slower: Entirely fuzzy
    let _ = fuzzy_regex::FuzzyRegex::new("(?:entirely_fuzzy){e<=1}").unwrap();
    
    println!("Done");
}

4. Use Greedy Suffix Patterns

fn main() {
    // Good: .*SUFFIX is optimized with reverse search
    let _ = fuzzy_regex::FuzzyRegex::new(".*test").unwrap();
    let _ = fuzzy_regex::FuzzyRegex::new(".*test~1").unwrap();
    
    // Also works with anchors
    let _ = fuzzy_regex::FuzzyRegex::new("^.*test$").unwrap();
    
    println!("Done");
}

Patterns like .*test automatically use reverse search to find the suffix first, then match everything before it. This is O(n) instead of O(n²).

Builder Options

1. Set Similarity Threshold

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // Skip low-quality matches early
    let _ = FuzzyRegexBuilder::new("(?:hello){e<=2}")
        .similarity(0.8)
        .build();
    
    println!("Done");
}

2. Use Case Insensitive at Builder

fn main() {
    use fuzzy_regex::FuzzyRegexBuilder;

    // More efficient than inline (?i)
    let _ = FuzzyRegexBuilder::new("(?:hello)")
        .case_insensitive(true)
        .build();
    
    println!("Done");
}

API Usage

1. Use find_iter for Simple Patterns

find_iter has specialized fast paths for common pattern types:

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Fast path for literal patterns
    let re = FuzzyRegex::new("hello").unwrap();
    let matches: Vec<_> = re.find_iter("hello world hello").collect();
    assert_eq!(matches.len(), 2);
}

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Fast path for alternations (uses Aho-Corasick)
    let re = FuzzyRegex::new("(quick|brown|fox)").unwrap();
    let matches: Vec<_> = re.find_iter("the quick brown fox").collect();
    assert_eq!(matches.len(), 3);
}

fn main() {
    use fuzzy_regex::FuzzyRegex;

    // Fast path for char class + literal patterns
    let re = FuzzyRegex::new(r"\d+ test").unwrap();
    let matches: Vec<_> = re.find_iter("123 test 456 test").collect();
    assert_eq!(matches.len(), 2);
}

These fast paths are selected automatically — no configuration needed.

2. Use Streaming for Large Data

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    
    // Good: Process in chunks
    let mut stream = re.stream();
    let data = b"hello world";
    for chunk in data.chunks(8) {
        // Process chunk
    }

    // Bad: Load all into memory
    let large_text = "hello world";
    let _matches: Vec<_> = re.find_iter(&large_text).collect();
}

3. Use find() for First Match

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    let text = "hello world";

    // Good: Stop after first match
    if let Some(m) = re.find(text) {
        println!("Found: {}", m.as_str());
    }

    // Unnecessary: Find all when only first needed
    let _all: Vec<_> = re.find_iter(text).collect();
}

3. Check supports_streaming()

fn main() {
    use fuzzy_regex::FuzzyRegex;

    let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    
    if re.supports_streaming() {
        // Use streaming API for best performance
        let mut stream = re.stream();
        println!("Streaming supported");
    }
}

Build Configuration

1. Release Mode

cargo build --release

2. LTO

[profile.release]
lto = true
codegen-units = 1

3. SIMD

Enabled by default. Ensure target CPU supports it.

Common Pitfalls

Issue	Solution
Slow with high edits	Lower edit limit
High memory usage	Use streaming
Slow on long text	Use exact prefix
Slow compilation	Enable LTO

Pathological Patterns

Some regex patterns can cause O(n²) behavior in naive implementations:

#![allow(unused)]
fn main() {
// Pattern: .*a|b on text of all 'b's
// Each 'b' matches individually
// Naive: O(n) matches × O(n) scan = O(n²)
}

When It Happens

Alternation with wildcards: .*a|b, (a|b)+
Overlapping matches: Many ways to match the same text
Backtracking patterns: Complex alternation

Solution: Hardened Mode

Use find_all_hardened() for O(n) guaranteed performance:

#![allow(unused)]
fn main() {
use fuzzy_regex::FuzzyRegex;

let re = FuzzyRegex::new(".*a|b").unwrap();
let text = "bbbbbbbbbbbbbbbb";

// Hardened mode: O(n) guaranteed
let matches = re.find_all_hardened(text);
}

Performance Comparison

Text Size	Standard	Hardened
1,000 bytes	1.08s	69ms
10,000 bytes	10.76s	69ms

The hardened mode maintains constant time regardless of text size.

Trade-offs

Hardened mode may be slightly slower for well-behaved patterns (where O(n²) doesn’t occur), but it’s the safest choice when:

Pattern behavior is unknown
Text comes from untrusted sources
Worst-case performance is critical

Pattern Syntax Reference

Complete reference for all pattern syntax.

Special Characters

Character	Description
`.`	Any character except newline
`^`	Start of string
`$`	End of string
`	`
`\`	Escape
`(`, `)`	Groups
`[`, `]`	Character class
`{`, `}`	Quantifier or fuzziness

Character Classes

Syntax	Description
`[abc]`	a, b, or c
`[^abc]`	Not a, b, or c
`[a-z]`	a to z
`[A-Za-z]`	ASCII letters
`[0-9]`	Digits
`\d`	Digit
`\D`	Non-digit
`\w`	Word character
`\W`	Non-word
`\s`	Whitespace
`\S`	Non-whitespace

Quantifiers

Syntax	Description
`*`	0 or more
`+`	1 or more
`?`	0 or 1
`{n}`	Exactly n
`{n,}`	n or more
`{n,m}`	n to m
`*?`	Lazy *
`+?`	Lazy +
`??`	Lazy ?

Fuzziness Markers

Syntax	Description
`{e<=N}`	Max N edits
`{i<=N}`	Max N insertions
`{d<=N}`	Max N deletions
`{s<=N}`	Max N substitutions
`{t<=N}`	Max N transpositions
`{c<=N}`	Max total cost N
`{Ni+Md...<=N}`	Weighted costs
`{e<=N:[class]}`	Restricted edits
`~N`	Shorthand for {e<=N}

Groups

Syntax	Description
`(...)`	Capture group
`(?:...)`	Non-capture
`(?<name>...)`	Named capture
`(?=...)`	Lookahead
`(?!...)`	Negative lookahead
`(?<=...)`	Lookbehind
`(?<!...)`	Negative lookbehind
`(?>...)`	Atomic group

Flags

Syntax	Description
`(?i)`	Case insensitive
`(?m)`	Multi-line
`(?s)`	Dot-all
`(?x)`	Verbose
`(?U)`	Ungreedy
`(?b)`	Best match
`(?e)`	Enhance match
`(?p)`	POSIX mode

Error Messages

Understanding and handling errors.

Pattern Errors

Invalid Syntax

Error: regex parse error

Cause: Malformed regex pattern.

Fix: Check for unclosed groups, invalid quantifiers, etc.

Unsupported Feature

Error: Feature not supported: backreferences

Cause: Pattern uses feature not implemented.

Fix: Simplify pattern or restructure.

Empty Pattern

Error: pattern cannot be empty

Cause: Empty regex not allowed.

Fix: Use non-empty pattern.

Compilation Errors

Fuzzy Limit Out of Range

Error: edit limit too high

Cause: Edit limit exceeds maximum.

Fix: Reduce limit.

Invalid Character Class

Error: invalid character class

Cause: Malformed character class.

Fix: Check class syntax.

Runtime Errors

Timeout

Error: timeout after ...

Cause: Matching took too long.

Fix: Simplify pattern, reduce edit limit.

No Match

Not an error - use find() returning Option.

#![allow(unused)]
fn main() {
if let Some(m) = re.find(text) {
    // Match found
} else {
    // No match
}
}

Migration Guide

Migrating from other libraries to fuzzy-regex.

From regex (standard)

fuzzy-regex extends the standard regex crate with fuzzy matching:

fn main() {
    // Standard regex
    let re = regex::new("hello").unwrap();

    // fuzzy-regex
    let re = fuzzy_regex::FuzzyRegex::new("hello").unwrap();

    // Add fuzziness
    let re = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=1}").unwrap();
    
    println!("Created");
}

From fuzzy-aho-corasick

See Compatibility Layer for a drop-in replacement.

From mrab-regex

The fuzzy syntax is compatible:

fn main() {
    // mrab-regex
    let _re = regex::new(r"(?i)(?:hello){e<=1}");

    // fuzzy-regex (same syntax)
    let re = fuzzy_regex::FuzzyRegex::new(r"(?i)(?:hello){e<=1}").unwrap();
    
    println!("Created");
}

Keyboard shortcuts

fuzzy-regex