KMP Algorithm

KMP String Matching Algorithm

What is the meaning of KMP?

KMP - Knuth - Morris - Pratt

History of KMP string matching algorithm

The algorithm was conceived inn 1970 by Donald Knuth and Vaughan Pratt and independently by James H. Morris. The three published it jointly in 1977.

Before learn about KMP algorithm there are two things to learn,

1. Prefix

A string ω is a prefix of a string x, denoted w ⊏ x,

if x = ωy for some string y € Σ* and

| ω | <= |x|

e.g. ab ⊏ abcca

2. Suffix

A string ω is a suffix of a string x, denoted ω⊐ x,

if x = yω for some string y € Σ* and

|ω| <= |x|

e.g. cca ⊐ abcca

Note: The empty string ε is both a suffix and a prefix of every string.

Let's move to algorithm,

KMP requires to calculate a prefix function called π.
π is an array of size length of pattern(m)

Always π[0] = 0;

let's talk with a example,

pattern = abacab

text = abacaabaccabacabaa

calculate prefix function (π)

As say before always π[0] = 0

Let's see how to calculate other π value.

To calculate π[1],

get first two letters in the pattern, then check there any prefix or suffix

a b

／＼

a b

no suffix and prefix

Then π[1] = 0

prefix (π)

Calculate π[2],

a b a

／＼

a a ⇽ suffix and prefix

a b b a

π[2] = 1

prefix (π)

Calculate π[3],

a b a c

／＼

a a

a b a c

a b a b a c

No suffix and prefix. Then,

π[3] = 0

prefix (π)

Calculate π[4],

a b a c a
／＼

a a ⇽ suffix and prefix

a b c a

a b a a c a

a b a c b a c a

π[4] = 1

prefix (π)

Calculate π[5],

a b a c a b

／＼

a b

a b a b ⇽ prefix

a b a c a b

a b a c a c a b

a b a c a b a c a b

π[5] = 2

prefix (π)

Note : a a b a a

／＼

a a ⇽ prefix

a a a a ⇽ prefix

a a b b a a

a a b a a b a a

There are two prefix then select as the π value to longest prefix length

π[5] = 2

After create prefix function,

Pattern compare with text left to right if there is a mismatch check prefix function,

Mismatch ⇾ π[k] ⇾ if value = 0 ⇾ shift pattern by 1

＼

if value ≠ 0 ⇾ shift T[i] = P[π[k]] T = text , P = pattern

pattern = abacab

text = abacaabaccabacabaa

a b a c a a b a c c a b a c a b a a

a b a c a b

// C program for implementation of KMP pattern searching

// algorithm

#include<stdio.h>

#include<string.h>

void computeLPSArray(char *pat, int M, int *lps);

// Prints occurrences of txt[] in pat[]

void KMPSearch(char *pat, char *txt)

{

int M = strlen(pat);

int N = strlen(txt);

// create lps[] that will hold the longest prefix suffix

// values for pattern

int lps[M];

// Preprocess the pattern (calculate lps[] array)

computeLPSArray(pat, M, lps);

int i = 0; // index for txt[]

int j = 0; // index for pat[]

while (i < N)

{

if (pat[j] == txt[i])

{

j++;

i++;

}

if (j == M)

{

printf("Found pattern at index %d n", i-j);

j = lps[j-1];

}

// mismatch after j matches

else if (i < N && pat[j] != txt[i])

{

// Do not match lps[0..lps[j-1]] characters,

// they will match anyway

if (j != 0)

j = lps[j-1];

else

i = i+1;

}

// Fills lps[] for given patttern pat[0..M-1]

void computeLPSArray(char *pat, int M, int *lps)

{

// length of the previous longest prefix suffix

int len = 0;

lps[0] = 0; // lps[0] is always 0

// the loop calculates lps[i] for i = 1 to M-1

int i = 1;

while (i < M)

{

if (pat[i] == pat[len])

{

len++;

lps[i] = len;

i++;

}

else // (pat[i] != pat[len])

{

// This is tricky. Consider the example.

// AAACAAAA and i = 7. The idea is similar

// to search step.

if (len != 0)

{

len = lps[len-1];

// Also, note that we do not increment

// i here

}

else // if (len == 0)

{

lps[i] = 0;

i++;

}

// Driver program to test above function

int main()

{

char *txt = "ABABDABACDABABCABAB";

char *pat = "ABABCABAB";

KMPSearch(pat, txt);

return 0;

}

Search This Blog

STREET TECH BOYS

KMP Algorithm

Comments

Post a Comment

visitors

Popular posts from this blog

String Matching with Finite Automata

From Melting Glaciers to Sustainable Solutions: How We Can Tackle Climate Change Together