KMP Algorithm
KMP String Matching Algorithm
- What is the meaning of KMP?
- KMP - Knuth - Morris - Pratt
- History of KMP string matching algorithm
- The algorithm was conceived inn 1970 by Donald Knuth and Vaughan Pratt and independently by James H. Morris. The three published it jointly in 1977.
1. Prefix
A string ω is a prefix of a string x, denoted w ⊏ x,
if x = ωy for some string y € Σ* and
| ω | <= |x|
e.g. ab ⊏ abcca
2. Suffix
A string ω is a suffix of a string x, denoted ω⊐ x,
if x = yω for some string y € Σ* and
|ω| <= |x|
e.g. cca ⊐ abcca
Note: The empty string ε is both a suffix and a prefix of every string.
Let's move to algorithm,
- KMP requires to calculate a prefix function called π.
- π is an array of size length of pattern(m)
let's talk with a example,
pattern = abacab
text = abacaabaccabacabaa
calculate prefix function (π)
As say before always π[0] = 0
Let's see how to calculate other π value.
To calculate π[1],
get first two letters in the pattern, then check there any prefix or suffix
a b
/ \
a b
no suffix and prefix
Then π[1] = 0
![]() |
prefix (π) |
Calculate π[2],
a b a
/ \
a a ⇽ suffix and prefix
a b b a
π[2] = 1
![]() |
prefix (π) |
Calculate π[3],
a b a c
/ \
a a
a b a c
a b a b a c
No suffix and prefix. Then,
π[3] = 0
![]() |
prefix (π) |
Calculate π[4],
a b a c a
/ \
/ \
a a ⇽ suffix and prefix
a b c a
a b a a c a
a b a c b a c a
π[4] = 1
![]() |
prefix (π) |
a b a c a b
/ \
a b
a b a b ⇽ prefix
a b a c a b
a b a c a c a b
a b a c a b a c a b
π[5] = 2
![]() |
prefix (π) |
Note : a a b a a
/ \
a a ⇽ prefix
a a a a ⇽ prefix
a a b b a a
a a b a a b a a
There are two prefix then select as the π value to longest prefix length
π[5] = 2
After create prefix function,
Pattern compare with text left to right if there is a mismatch check prefix function,
Mismatch ⇾ π[k] ⇾ if value = 0 ⇾ shift pattern by 1
\
if value ≠ 0 ⇾ shift T[i] = P[π[k]] T = text , P = pattern
pattern = abacab
text = abacaabaccabacabaa
a b a c a a b a c c a b a c a b a a
a b a c a b
a b a c a b
a b a c a b
a b a c a b
a b a c a b
// C program for implementation of KMP pattern searching
// algorithm
#include<stdio.h>
#include<string.h>
void computeLPSArray(char *pat, int M, int *lps);
// Prints occurrences of txt[] in pat[]
void KMPSearch(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);
// create lps[] that will hold the longest prefix suffix
// values for pattern
int lps[M];
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt[]
int j = 0; // index for pat[]
while (i < N)
{
if (pat[j] == txt[i])
{
j++;
i++;
}
if (j == M)
{
printf("Found pattern at index %d n", i-j);
j = lps[j-1];
}
// mismatch after j matches
else if (i < N && pat[j] != txt[i])
{
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0)
j = lps[j-1];
else
i = i+1;
}
}
}
// Fills lps[] for given patttern pat[0..M-1]
void computeLPSArray(char *pat, int M, int *lps)
{
// length of the previous longest prefix suffix
int len = 0;
lps[0] = 0; // lps[0] is always 0
// the loop calculates lps[i] for i = 1 to M-1
int i = 1;
while (i < M)
{
if (pat[i] == pat[len])
{
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
// This is tricky. Consider the example.
// AAACAAAA and i = 7. The idea is similar
// to search step.
if (len != 0)
{
len = lps[len-1];
// Also, note that we do not increment
// i here
}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}
// Driver program to test above function
int main()
{
char *txt = "ABABDABACDABABCABAB";
char *pat = "ABABCABAB";
KMPSearch(pat, txt);
return 0;
}
Comments
Post a Comment