Fuzzy Bridge
Connecting regex engine with fuzzy matching components.
Purpose
The Fuzzy Bridge coordinates between the NFA-based regex engine and specialized fuzzy matchers (Bitap, Damerau-Levenshtein NFA).
Responsibilities
- Literal Extraction: Extract exact-match literals from patterns for pre-filtering
- Search Coordination: Order and combine multiple matchers
- Result Merging: Combine and rank results from different matchers
Architecture
Regex Pattern: hello (?:world){e<=1}
│
▼
┌────────────────┐
│ Fuzzy Bridge │
└───────┬────────┘
│
┌───────────┴───────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ DFA │ │ Fuzzy │
│ (exact) │ │ Matcher │
│ "hello" │ │ (Bitap/NFA) │
└──────────────┘ │ "world"{e<=1}│
└──────────────┘
Literal Extraction
#![allow(unused)]
fn main() {
// Pattern: "prefix (?:target){e<=1} suffix"
// Extracted literals: ["prefix", "suffix"]
}
Extraction Strategy
- Identify fuzzy segments in pattern
- Extract surrounding exact literals
- Use literals for:
- Pre-filtering text positions
- Quick rejection of non-matching areas
Search Flow
Simple Fuzzy Pattern
For (?:hello){e<=1}:
- Use Bitap directly (pattern ≤64 chars)
- Or build Damerau-Levenshtein NFA (longer patterns)
Mixed Pattern
For exact (?:fuzzy){e<=1}:
- Use DFA to find “exact” quickly
- For each “exact” match, check surrounding for fuzzy match
Complex Pattern
For (?:a){e<=1}(?:b){e<=1}:
- Find all fuzzy matches for “a”
- Find all fuzzy matches for “b”
- Check if they can be combined
Optimization Techniques
Prefiltering
Use extracted literals to quickly skip non-matching regions:
#![allow(unused)]
fn main() {
// Text: "blah blah prefix target suffix blah"
// Literal "prefix" found at position X
// Only check fuzzy "target" around position X
}
Early Termination
Stop searching when match found (for find/first):
#![allow(unused)]
fn main() {
// First match found, don't continue
}
Caching
Cache compiled matchers for reuse:
#![allow(unused)]
fn main() {
// Bitap matcher cached for pattern
}