Performance Tips
Optimize fuzzy-regex for your use case.
Pattern Design
1. Use Specific Edit Limits
fn main() {
// Good: Specific limit
let _ = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=1}").unwrap();
// Less efficient: Higher limit
let _ = fuzzy_regex::FuzzyRegex::new("(?:hello){e<=5}").unwrap();
println!("Done");
}
Lower edit limits = faster matching.
2. Prefer Shorter Patterns
fn main() {
// Bitap (fast): ≤64 chars
let _ = fuzzy_regex::FuzzyRegex::new("(?:short){e<=1}").unwrap();
// NFA (slower): >64 chars
let _ = fuzzy_regex::FuzzyRegex::new("(?:very_long_pattern_that_exceeds_sixty_four_characters){e<=1}").unwrap();
println!("Done");
}
3. Extract Exact Parts
fn main() {
// Good: Exact prefix and suffix help prefilter
let _ = fuzzy_regex::FuzzyRegex::new("exact_prefix (?:fuzzy){e<=1} exact_suffix").unwrap();
// Slower: Entirely fuzzy
let _ = fuzzy_regex::FuzzyRegex::new("(?:entirely_fuzzy){e<=1}").unwrap();
println!("Done");
}
4. Use Greedy Suffix Patterns
fn main() {
// Good: .*SUFFIX is optimized with reverse search
let _ = fuzzy_regex::FuzzyRegex::new(".*test").unwrap();
let _ = fuzzy_regex::FuzzyRegex::new(".*test~1").unwrap();
// Also works with anchors
let _ = fuzzy_regex::FuzzyRegex::new("^.*test$").unwrap();
println!("Done");
}
Patterns like .*test automatically use reverse search to find the suffix first, then match everything before it. This is O(n) instead of O(n²).
Builder Options
1. Set Similarity Threshold
fn main() {
use fuzzy_regex::FuzzyRegexBuilder;
// Skip low-quality matches early
let _ = FuzzyRegexBuilder::new("(?:hello){e<=2}")
.similarity(0.8)
.build();
println!("Done");
}
2. Use Case Insensitive at Builder
fn main() {
use fuzzy_regex::FuzzyRegexBuilder;
// More efficient than inline (?i)
let _ = FuzzyRegexBuilder::new("(?:hello)")
.case_insensitive(true)
.build();
println!("Done");
}
API Usage
1. Use Streaming for Large Data
fn main() {
use fuzzy_regex::FuzzyRegex;
let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
// Good: Process in chunks
let mut stream = re.stream();
let data = b"hello world";
for chunk in data.chunks(8) {
// Process chunk
}
// Bad: Load all into memory
let large_text = "hello world";
let _matches: Vec<_> = re.find_iter(&large_text).collect();
}
2. Use find() for First Match
fn main() {
use fuzzy_regex::FuzzyRegex;
let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
let text = "hello world";
// Good: Stop after first match
if let Some(m) = re.find(text) {
println!("Found: {}", m.as_str());
}
// Unnecessary: Find all when only first needed
let _all: Vec<_> = re.find_iter(text).collect();
}
3. Check supports_streaming()
fn main() {
use fuzzy_regex::FuzzyRegex;
let re = FuzzyRegex::new("(?:hello){e<=1}").unwrap();
if re.supports_streaming() {
// Use streaming API for best performance
let mut stream = re.stream();
println!("Streaming supported");
}
}
Build Configuration
1. Release Mode
cargo build --release
2. LTO
[profile.release]
lto = true
codegen-units = 1
3. SIMD
Enabled by default. Ensure target CPU supports it.
Common Pitfalls
| Issue | Solution |
|---|---|
| Slow with high edits | Lower edit limit |
| High memory usage | Use streaming |
| Slow on long text | Use exact prefix |
| Slow compilation | Enable LTO |
Pathological Patterns
Some regex patterns can cause O(n²) behavior in naive implementations:
#![allow(unused)]
fn main() {
// Pattern: .*a|b on text of all 'b's
// Each 'b' matches individually
// Naive: O(n) matches × O(n) scan = O(n²)
}
When It Happens
- Alternation with wildcards:
.*a|b,(a|b)+ - Overlapping matches: Many ways to match the same text
- Backtracking patterns: Complex alternation
Solution: Hardened Mode
Use find_all_hardened() for O(n) guaranteed performance:
#![allow(unused)]
fn main() {
use fuzzy_regex::FuzzyRegex;
let re = FuzzyRegex::new(".*a|b").unwrap();
let text = "bbbbbbbbbbbbbbbb";
// Hardened mode: O(n) guaranteed
let matches = re.find_all_hardened(text);
}
Performance Comparison
| Text Size | Standard | Hardened |
|---|---|---|
| 1,000 bytes | 1.08s | 69ms |
| 10,000 bytes | 10.76s | 69ms |
The hardened mode maintains constant time regardless of text size.
Trade-offs
Hardened mode may be slightly slower for well-behaved patterns (where O(n²) doesn’t occur), but it’s the safest choice when:
- Pattern behavior is unknown
- Text comes from untrusted sources
- Worst-case performance is critical