mad*_*n54 11 string algorithm substring dfa knuth-morris-pratt
我指的是Sedgewick的书"Algorithms"(第4版)中用于子串搜索的Knuth-Morris-Pratt(KMP)算法的概述.
KMP算法在子串搜索中使用基于确定性有限自动机(DFA)的备份.我了解DFA如何输入算法,但我不明白如何构建 DFA,这由以下代码片段完成:
dfa[pat.charAt(0)][0] = 1;
for (int X = 0; j = 1; j< M; j++) {
for (int c = 0; c < R; c++) {
dfa[c][j] = dfa[c][X];
}
dfa[pat.charAt(j)][j] = j+1;
X = dfa[pat.charAt(j)][X];
}
Run Code Online (Sandbox Code Playgroud)
M
模式的长度pat
和R
字母的大小在哪里.该charAt()
函数返回相应位置的字符的整数值.
有人能解释这段代码构造dfa的方式吗?我迷失在内部for循环的实际直觉意义上.
Kai*_*dul 11
让我们看看以下FA模式ACACAGA.
上图表示ACACAGA模式的图形和表格表示.
这里,DFA中的状态数是M + 1,其中M是模式的长度.构造DFA的主要任务是从每个可能的角色的当前状态获取下一个状态.给定字符x和状态k,我们可以通过考虑字符串"pat [0..k-1] x"来获得下一个状态,这基本上是模式字符pat [0],pat 1 ... pat [k- ]的串联1]和角色x.想法是获得给定模式的最长前缀的长度,使得前缀也是"pat [0..k-1] x"的后缀(LPS).长度值给我们下一个状态.
例如,让我们看看如何从上图中的当前状态5和字符"C"获取下一个状态.我们需要考虑字符串"pat [0..5] C",即"ACACAC".模式的最长前缀的长度使得前缀是"ACACAC"的后缀是4("ACAC").所以下一个状态(来自状态5)对于字符'C'是4.
// dfa[i][j] = k denotes the transition function will go k'th state
// with character i from state j
// DFA will go always (i + 1)'th state from i'th state
//if the character c is equal to current character of pattern
dfa[pat.charAt(0)][0] = 1;
// here X is initialized with LPS (longest prefix suffix)
// of pattern[0....j - 1]. LPS[0] is always 0
for (int X = 0; j = 1; j< M; j++) {
for (int c = 0; c < R; c++) { // for all possible characters
// transition function from j'th state taking character c
// will go to the same state where it went from X(LPS)'th state
// taking character c (justify it with an example)
dfa[c][j] = dfa[c][X];
}
// DFA will go always (i + 1)th state from i'th state if
// the character c is equal to current character of pattern
dfa[pat.charAt(j)][j] = j + 1;
X = dfa[pat.charAt(j)][X]; // update LPS
}
Run Code Online (Sandbox Code Playgroud)