找到长度为N的重复子字符串

Question

找到长度为N的重复子字符串

Pro*_*ude 6 java algorithm substring

我必须创建一个Java程序,它在给定的String中查找长度为n的所有重复子字符串.输入字符串非常长,蛮力方法需要花费太多时间.

我已经尝试过:
现在我分别找到每个子字符串,并使用KMP算法检查该子字符串的重复.这也花费了太多时间.

对于这个问题,什么是更有效的方法？

Answer 1

ami*_*mit 2

我将采纳 @peter.petrov 的建议，并通过解释如何实际使用后缀树来解决问题来增强它：

 1. Create a suffix tree from the string, let it be `T`.
 2. Find all nodes of depth `n` in the tree, let that set of nodes be `S`. This can be done using DFS, for example.
 3. For each node `n` in `S`, do the following:
     3.1. Do a DFS, and count the number of terminals `n` leads to. Let this number be `count`
     3.2. If `count>1`, yield the substring that is related to `n` (the path from root to `n`), and `count`

Run Code Online (Sandbox Code Playgroud)

请注意，该算法处理任何长度的子字符串n并将其添加到集合中S，然后通过计算该子字符串通向的终结点数量来搜索该子字符串实际上是子字符串的次数。

这意味着问题的复杂性是O(Creation + Traversal)- 意思是，您首先创建树，然后遍历它（很容易看出，您不会在步骤 2-3 中多次遍历树中的每个节点）。由于遍历显然比创建树“更快” - 它给您留下O(Creation)，正如@perer.petrov 所指出的那样，O(|S|)或者O(|S|log|S|)取决于您选择的算法。

归档时间：	10 年，11 月前
查看次数：	978 次
最近记录：	10 年，11 月前