这是一个imperative解决方案:
def longestCommonSubstring(a: String, b: String) : String = {
def loop(m: Map[(Int, Int), Int], bestIndices: List[Int], i: Int, j: Int) : String = {
if (i > a.length) {
b.substring(bestIndices(1) - m((bestIndices(0),bestIndices(1))), bestIndices(1))
} else if (i == 0 || j == 0) {
loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
} else if (a(i-1) == b(j-1) && math.max(m((bestIndices(0),bestIndices(1))), m((i-1,j-1)) + 1) == (m((i-1,j-1)) + 1)) {
loop(
m + ((i,j) -> (m((i-1,j-1)) + 1)),
List(i, j),
if(j == b.length) i + 1 else i,
if(j == b.length) 0 else j + 1
)
} else {
loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
}
}
loop(Map[(Int, Int), Int](), List(0, 0), 0, 0)
}
Run Code Online (Sandbox Code Playgroud)
我正在寻找一个更紧凑的方法,并functional way找到 最长的公共子字符串。
def getAllSubstrings(str: String): Set[String] = {
str.inits.flatMap(_.tails).toSet
}
def longestCommonSubstring(str1: String, str2: String): String = {
val str1Substrings = getAllSubstrings(str1)
val str2Substrings = getAllSubstrings(str2)
str1Substrings.intersect(str2Substrings).maxBy(_.length)
}
Run Code Online (Sandbox Code Playgroud)
首先获取两个字符串的集合中所有可能的子字符串(从此处获取)(以删除重复项),然后与这些集合相交并找到最长的公共子字符串。
您拥有的代码已经可以正常运行,而且没有那么复杂。与其他当前发布的解决方案相比,它还具有渐近更好的时间效率。
我只是简化它,清理一下并修复错误:
def longestCommonSubstring(a: String, b: String) = {
def loop(bestLengths: Map[(Int, Int), Int], bestIndices: (Int, Int), i: Int, j: Int): String = {
if (i > a.length) {
val bestJ = bestIndices._2
b.substring(bestJ - bestLengths(bestIndices), bestJ)
} else {
val currentLength = if (a(i-1) == b(j-1)) bestLengths(i-1, j-1) + 1 else 0
loop(
if (currentLength != 0) bestLengths + ((i, j) -> currentLength) else bestLengths,
if (currentLength > bestLengths(bestIndices)) (i, j) else bestIndices,
if (j == b.length) i + 1 else i,
if (j == b.length) 1 else j + 1)
}
}
if (b.isEmpty) ""
else loop(Map.empty[(Int, Int), Int].withDefaultValue(0), (0, 0), 1, 1)
}
Run Code Online (Sandbox Code Playgroud)
旁注:这是我对这个问题的第三个答案,因为 StackOverflow 政策不允许从根本上替换先前答案的内容。感谢@Kolmar 的反馈,这个新答案比我之前的答案性能要高得多。
LCS(最长公共子串)问题空间投入了大量时间来寻找最佳解决策略。要观察更一般的计算机科学问题和最优策略,请查看这篇维基百科文章。这篇维基百科文章的下面是一些描述实现策略的伪代码。
基于维基百科文章伪代码,我将提出几种不同的解决方案。目的是允许人们复制/粘贴所需的特定解决方案,而无需进行大量重构:
LCSubstr:将维基百科文章伪代码翻译成 Scala ,使用命令式可变风格LCSubstrFp:重构LCSubstr为惯用的Scala 函数式不可变风格longestCommonSubstrings:重构LCSubstrFp以使用描述性名称(例如left&right而不是sand t)并跳过在中存储零长度MaplongestCommonSubstringsFast:重构longestCommonSubstrings以深度优化CPU和内存longestCommonSubstringsWithIndexes:重构longestCommonSubstringsFast以通过将每个条目扩展为一个元组来增强返回值,该元组(String, (Int, Int))包括找到的子字符串和String找到子字符串的每个输入中的索引(注意:String如果出现相同的情况,这将创建索引对的组合扩展不止一次)firstLongestCommonSubstring:注重效率的版本longestCommonSubstringsFast提供了在只关心第一个濒海战斗舰并且想忽略其他相同规模的濒海战斗舰时提前终止的机会。longestCommonSubstringsUltimate:重构以longestCommonSubstringsFast添加内部实现可变性,同时在外部保留函数的引用透明度。对OP请求的更直接的回答将落在LCSubstrFp和之间longestCommonSubstringsFast。LCSubstrFp是最直接的,但效率相当低。使用longestCommonSubstringsFast效率显着提高,因为最终使用的 CPU 和 GC 少得多。如果函数实现中包含和约束的内部可变性是可以接受的,那么这longestCommonSubstringsUltimate是迄今为止 CPU 负担和内存占用最小的版本。
将Wikipedia 文章伪代码翻译成 Scala ,使用命令式可变风格。
目的是尽可能接近 Scala 重现一对一的实现。例如,Scala 假设 String 是从零开始的索引,而伪代码显然使用从一开始的索引,这需要进行多次调整。
def LCSubstr(s: String, t: String): scala.collection.mutable.Set[String] =
if (s.nonEmpty && t.nonEmpty) {
val l: scala.collection.mutable.Map[(Int, Int), Int] = scala.collection.mutable.Map.empty
var z: Int = 0
var ret: scala.collection.mutable.Set[String] = scala.collection.mutable.Set.empty
(0 until s.length).foreach {
i =>
(0 until t.length).foreach {
j =>
if (s(i) == t(j)) {
if ((i == 0) || (j == 0))
l += ((i, j) -> 1)
else
l += ((i, j) -> (l((i - 1, j - 1)) + 1))
if (l((i, j)) > z) {
z = l((i, j))
ret = scala.collection.mutable.Set(s.substring(i - z + 1, i + 1))
}
else
if (l((i, j)) == z)
ret += s.substring(i - z + 1, i + 1)
} else
l += ((i, j) -> 0)
}
}
ret
}
else scala.collection.mutable.Set.empty
Run Code Online (Sandbox Code Playgroud)
重构LCSubstr为惯用的Scala 函数式不可变风格。
所有命令式和突变代码都已替换为函数和不可变代码。这两个for循环已被递归替换。
def LCSubstrFp(s: String, t: String): Set[String] =
if (s.nonEmpty && t.nonEmpty) {
@scala.annotation.tailrec
def recursive(
i: Int = 0,
j: Int = 0,
z: Int = 0,
l: Map[(Int, Int), Int] = Map.empty,
ret: Set[String] = Set.empty
): Set[String] =
if (i < s.length) {
val (newI, newJ) =
if (j < t.length - 1) (i, j + 1)
else (i + 1, 0)
val lij =
if (s(i) != t(j)) 0
else
if ((i == 0) || (j == 0)) 1
else l((i - 1, j - 1)) + 1
recursive(
newI,
newJ,
if (lij > z) lij
else z,
l + ((i, j) -> lij),
if (lij > z) Set(s.substring(i - lij + 1, i + 1))
else
if ((lij == z) && (z > 0)) ret + s.substring(i - lij + 1, i + 1)
else ret
)
}
else ret
recursive()
}
else Set.empty
Run Code Online (Sandbox Code Playgroud)
重构LCSubstrFp以使用描述性名称(例如left&right而不是sand t)并跳过在Map.
除了增强可读性之外,这种重构还无需存储零长度值,从而lengthByIndexLongerAndIndexShorter大大减少了“内存搅动”量。再次对函数式风格进行调整,返回值也得到了增强,Set通过将 包装Set在Option. 如果返回的值是 a Some,则包含的内容Set将始终包含至少一项。
def longestCommonSubstrings(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) {
val (shorter, longer) =
if (left.length < right.length) (left, right)
else (right, left)
@scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthByIndexLongerAndIndexShorter: Map[(Int, Int), Int] = Map.empty,
accumulator: List[Int] = Nil
): (Int, List[Int]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else
if ((indexShorter == 0) || (indexLonger == 0)) 1
else lengthByIndexLongerAndIndexShorter.getOrElse((indexLonger - 1, indexShorter - 1), 0) + 1
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newLengthByIndexLongerAndIndexShorter =
if (length > 0) lengthByIndexLongerAndIndexShorter + ((indexLonger, indexShorter) -> length)
else lengthByIndexLongerAndIndexShorter
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = indexShorter - length + 1
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
newLengthByIndexLongerAndIndexShorter,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
newLengthByIndexLongerAndIndexShorter,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive()
if (indexShorters.nonEmpty)
Some(
indexShorters
.map {
indexShorter =>
shorter.substring(indexShorter, indexShorter + length)
}
.toSet
)
else None
}
else None
Run Code Online (Sandbox Code Playgroud)
重构longestCommonSubstrings以深度优化 CPU 和内存。
在保持功能性和不变性的同时消除每一点低效率,执行速度提高了数倍longestCommonSubstrings。大部分成本降低是通过用一对仅跟踪当前行和先前行的 s替换Map整个矩阵的s 来实现的。List
要轻松查看差异longestCommonSubstrings,请查看此视觉差异。
def longestCommonSubstringsFast(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) {
val (shorter, longer) =
if (left.length < right.length) (left, right)
else (right, left)
@scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthsPrior: List[Int] = List.fill(shorter.length)(0),
lengths: List[Int] = Nil,
accumulator: List[Int] = Nil
): (Int, List[Int]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else lengthsPrior.head + 1
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = indexShorter - length + 1
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
lengthsPrior.tail,
length :: lengths,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
0 :: lengths.reverse,
Nil,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive()
if (indexShorters.nonEmpty)
Some(
indexShorters
.map {
indexShorter =>
shorter.substring(indexShorter, indexShorter + length)
}
.toSet
)
else None
}
else None
Run Code Online (Sandbox Code Playgroud)
重构以longestCommonSubstringsFast通过将每个条目扩展为一个元组来增强返回值,(String, (Int, Int))该元组包括找到的子字符串以及String找到子字符串的每个输入中的索引。
注意:如果索引对String出现多次,这将创建索引对的组合扩展。
再次对函数式风格进行调整,返回值也得到了增强,List通过将 包装List在Option. 如果返回的值是 a Some,则包含的内容List将始终包含至少一项。
def longestCommonSubstringsWithIndexes(left: String, right: String): Option[List[(String, (Int, Int))]] =
if (left.nonEmpty && right.nonEmpty) {
val isLeftShorter = left.length < right.length
val (shorter, longer) =
if (isLeftShorter) (left, right)
else (right, left)
@scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthsPrior: List[Int] = List.fill(shorter.length)(0),
lengths: List[Int] = Nil,
accumulator: List[(Int, Int)] = Nil
): (Int, List[(Int, Int)]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else lengthsPrior.head + 1
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = (indexLonger - length + 1, indexShorter - length + 1)
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
lengthsPrior.tail,
length :: lengths,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
0 :: lengths.reverse,
Nil,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexPairs) = recursive()
if (indexPairs.nonEmpty)
Some(
indexPairs
.reverse
.map {
indexPair =>
(
longer.substring(indexPair._1, indexPair._1 + length),
if (isLeftShorter) indexPair.swap else indexPair
)
}
)
else None
}
else None
Run Code Online (Sandbox Code Playgroud)
longestCommonSubstringsFast当只关心第一个濒海战斗舰并且想忽略其他相同规模的濒海战斗舰时,其注重效率的版本提供了提前终止的机会。
def firstLongestCommonSubstring(left: String, right: String): Option[(String, (Int, Int))] =
if (left.nonEmpty && right.nonEmpty) {
val isLeftShorter = left.length < right.length
val (shorter, longer) =
if (isLeftShorter) (left, right)
else (right, left)
@scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthsPrior: List[Int] = List.fill(shorter.length)(0),
lengths: List[Int] = Nil,
accumulator: Option[(Int, Int)] = None
): Option[(Int, (Int, Int))] =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else lengthsPrior.head + 1
val newAccumulator =
if (length > currentLongestLength) Some((indexLonger - length + 1, indexShorter - length + 1))
else accumulator
if (length < shorter.length) {
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
lengthsPrior.tail,
length :: lengths,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
0 :: lengths.reverse,
Nil,
newAccumulator
)
}
else
recursive(longer.length, 0, length, lengthsPrior, lengths, newAccumulator) //early terminate
}
else accumulator.map((currentLongestLength, _))
recursive().map {
case (length, indexPair) =>
(
longer.substring(indexPair._1, indexPair._1 + length),
if (isLeftShorter) indexPair.swap else indexPair
)
}
}
else None
Run Code Online (Sandbox Code Playgroud)
奖金:
重构longestCommonSubstringsFast以添加内部实现可变性,同时在外部保留函数的引用透明度。
进一步消除每一点低效率,同时保持功能性和引用透明性(在实现本身中利用可变性,有些人认为这不是有效的功能),执行速度几乎提高了三倍以上longestCommonSubstringsFast。大部分成本降低是通过将一对Lists 替换为单个 s 来实现Array的。
要轻松查看差异longestCommonSubstringsFast,请查看此视觉差异。
def longestCommonSubstringsUltimate(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) {
val (shorter, longer) =
if (left.length < right.length) (left, right)
else (right, left)
val lengths: Array[Int] = new Array(shorter.length) //mutable
@scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lastIterationLength: Int = 0,
accumulator: List[Int] = Nil
): (Int, List[Int]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else
if (indexShorter == 0) 1
else lastIterationLength + 1
val newLastIterationLength = lengths(indexShorter)
lengths(indexShorter) = length //mutation
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = indexShorter - length + 1
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
newLastIterationLength,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
newLastIterationLength,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive()
if (indexShorters.nonEmpty)
Some(
indexShorters
.map {
indexShorter =>
shorter.substring(indexShorter, indexShorter + length)
}
.toSet
)
else None
}
else None
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
992 次 |
| 最近记录: |