Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Bitap Algorithm

Fast fuzzy matching algorithm for short patterns.

Overview

The Bitap algorithm (also known as the Shift-Or algorithm) is used for fuzzy matching of short patterns (≤64 characters). It achieves O(n×k) time complexity where n is text length and k is maximum edits.

How It Works

Bitap uses bitmasks to track which positions can match:

  1. Initialize: Create a bitmask for each character in the pattern
  2. Scan: For each text character, update the bitmask
  3. Match: When bitmask indicates a match, extract the match

Bitap State

#![allow(unused)]
fn main() {
// Simplified bitap state representation
struct BitapState {
    // Bits representing current match positions
    // For pattern "hello":
    // Position:    5   4   3   2   1   0
    // Bits:        0   0   0   0   0   1  = "h"
    //              0   0   0   0   1   0  = "e"
    //              ... continue ...
}
}

Key Properties

Pattern Length Limit

  • Maximum 64 characters (uses 64-bit integers)
  • Longer patterns use Damerau-Levenshtein NFA

Edit Distance

  • Supports insertions, deletions, substitutions
  • Transpositions handled specially

SIMD Acceleration

Modern implementations use SIMD for parallelism:

#![allow(unused)]
fn main() {
// SIMD processes multiple characters at once
// 256-bit AVX2: process 32 bytes per iteration
// 512-bit AVX512: process 64 bytes per iteration
}

Performance

Pattern LengthText SizeThroughput
14 chars200KB~1.5 Gbps
14 chars1MB~1.4 Gbps
14 chars, k=1200KB~2.0 Gbps
14 chars, k=3200KB~700 Mbps

When Bitap is Used

  • Pattern length ≤ 64 characters
  • Simple fuzzy matching (no complex regex features)
  • Exact matching benefits from optimizations

Implementation Details

State Machine

#![allow(unused)]
fn main() {
// Each iteration:
// 1. Shift state right by 1
// 2. OR with pattern mask for current char
// 3. Check for match (zero in certain bits)
}

Handling Errors

Bitap extends to handle errors by maintaining multiple state masks (one per allowed error):

#![allow(unused)]
fn main() {
// For k errors, maintain k+1 states
// state[0] = exact match
// state[1] = 1 error
// state[k] = k errors
}