Ale*_*iam 17 java algorithm substring
我试图找到以下模式:
不知道可能发生的任何模式.
例如:
'ell', 'the b', 'y '.'the boy fell by the bell'.使用双for循环,它可以非常低效地强制使用:
ArrayList<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) {
int limit = (length - i) / 2;
for (int j = limit; j >= 1; j--) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if(candidate.length() <= 1) {
continue;
}
if (string.substring(candidateEndIndex).contains(candidate)) {
boolean notASubpattern = true;
for (String pattern : patternsList) {
if (pattern.contains(candidate)) {
notASubpattern = false;
break;
}
}
if (notASubpattern) {
patternsList.add(candidate);
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
然而,当搜索大量模式的大字符串时,这是非常慢的.
Mat*_*ans 24
您可以在线性时间内为字符串构建后缀树:https: //en.wikipedia.org/wiki/Suffix_tree
您要查找的模式是与仅具有叶子元素的内部节点对应的字符串.
您可以使用n-gram来查找字符串中的模式.扫描字符串n-gram需要O(n)时间.当您使用n-gram找到子字符串时,将其放入哈希表中,并计算字符串中找到子字符串的次数.当您在字符串中搜索n-gram时,在哈希表中搜索大于1的计数,以查找字符串中的重复模式.
例如,在字符串"男孩摔倒钟,男孩摔倒钟"使用6克将发现子串"男孩摔倒了钟".具有该子字符串的哈希表条目的计数为2,因为它在字符串中出现了两次.改变n-gram中的单词数将帮助您发现字符串中的不同模式.
Dictionary<string, int>dict = new Dictionary<string, int>();
int count = 0;
int ngramcount = 6;
string substring = "";
// Add entries to the hash table
while (count < str.length) {
// copy the words into the substring
int i = 0;
substring = "";
while (ngramcount > 0 && count < str.length) {
substring[i] = str[count];
if (str[i] == ' ')
ngramcount--;
i++;
count++;
}
ngramcount = 6;
substring.Trim(); // get rid of the last blank in the substring
// Update the dictionary (hash table) with the substring
if (dict.Contains(substring)) { // substring is already in hash table so increment the count
int hashCount = dict[substring];
hashCount++;
dict[substring] = hashCount;
}
else
dict[substring] = 1;
}
// Find the most commonly occurrring pattern in the string
// by searching the hash table for the greatest count.
int maxCount = 0;
string mostCommonPattern = "";
foreach (KeyValuePair<string, int> pair in dict) {
if (pair.Value > maxCount) {
maxCount = pair.Value;
mostCommonPattern = pair.Key;
}
}
Run Code Online (Sandbox Code Playgroud)