Skip to content

Files

Latest commit

Sep 2, 2024
46d4a93 · Sep 2, 2024

History

History

string-matching

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Aug 30, 2024
Aug 30, 2024
Aug 30, 2024
Aug 30, 2024
Sep 2, 2024
Aug 24, 2024

String-matching


Overview

CLRS Topic
32.1 Naive String-matching
32.2 Rabin-Karp Algorithm
32.3 String-matching Automata
32.4 Knuth-Morris-Pratt Algorithm
32.5 [ed4] Suffix Arrays

String-matching Problem

String matching is the problem of finding all occurrences of a string pattern P of length m in a text T of lenght n , where m n . The characters in the pattern and the text come from a finite set Σ called the alphabet .

We say that pattern P occurs with shift s in text T if 0 s n m and T [ s : s + m 1 ] = P [ 0 : m 1 ] . If P occurs with shift s in T , then s is called a valid shift for P in T , otherwise s is called an invalid shift for P in T .

In other words, the string-matching problem is to find all valid shifts for P in T , i.e. the shift set of P in T . Using the terminology below, the string-matching problem is to find all shifts  s such that  P T s + m , where 0 s n m and T s + m is the prefix of T of length s + m .


Terminology and properties

  • Σ : the set of all finite-length strings over Σ
  • ϵ : the empty string, i.e. the string of length 0
  • | x | : the length of string x
  • x y : the concatenation of strings x and y , with | x y | = | x | + | y |
  • T q : the q-character prefix of string T , i.e. T q = T [ 0 : q 1 ]
  • x y : string x is a prefix of string y , i.e. y = x z for some string z Σ
  • x y : string x is a suffix of string y , i.e. y = z x for some string z Σ
  • x y | x | | y |
  • x y | x | | y |
  • x y | x | < | y | : x is a proper prefix of y
  • x y | x | < | y | : x is a proper suffix of y
  • x Σ ϵ x
  • x Σ ϵ x
  • a Σ : x y x a y a
  • z Σ : x y y z x z
  • z Σ : x y y z x z