在Scala中找到两个字符串之间的最长公共子字符串的功能方法

obl*_*ion 2 scala

这是一个imperative解决方案:

def longestCommonSubstring(a: String, b: String) : String = {
    def loop(m: Map[(Int, Int), Int], bestIndices: List[Int], i: Int, j: Int) : String = {
      if (i > a.length) {
        b.substring(bestIndices(1) - m((bestIndices(0),bestIndices(1))), bestIndices(1))
      } else if (i == 0 || j == 0) {
        loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
      } else if (a(i-1) == b(j-1) && math.max(m((bestIndices(0),bestIndices(1))), m((i-1,j-1)) + 1) == (m((i-1,j-1)) + 1)) {
        loop(
          m + ((i,j) -> (m((i-1,j-1)) + 1)),
          List(i, j),
          if(j == b.length) i + 1 else i,
          if(j == b.length) 0 else j + 1
        )
      } else {
        loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
      }
    }
    loop(Map[(Int, Int), Int](), List(0, 0), 0, 0)
  }
Run Code Online (Sandbox Code Playgroud)

我正在寻找一个更紧凑的方法,并functional way找到 最长的公共子字符串

thw*_*gan 5

def getAllSubstrings(str: String): Set[String] = {
  str.inits.flatMap(_.tails).toSet
}
def longestCommonSubstring(str1: String, str2: String): String = {
  val str1Substrings = getAllSubstrings(str1)
  val str2Substrings = getAllSubstrings(str2)

  str1Substrings.intersect(str2Substrings).maxBy(_.length)
}
Run Code Online (Sandbox Code Playgroud)

首先获取两个字符串的集合中所有可能的子字符串(从此处获取)(以删除重复项),然后与这些集合相交并找到最长的公共子字符串。


Kol*_*mar 5

您拥有的代码已经可以正常运行,而且没有那么复杂。与其他当前发布的解决方案相比,它还具有渐近更好的时间效率。

我只是简化它,清理一下并修复错误:

def longestCommonSubstring(a: String, b: String) = {  
  def loop(bestLengths: Map[(Int, Int), Int], bestIndices: (Int, Int), i: Int, j: Int): String = {
    if (i > a.length) {
      val bestJ = bestIndices._2
      b.substring(bestJ - bestLengths(bestIndices), bestJ)
    } else {
      val currentLength = if (a(i-1) == b(j-1)) bestLengths(i-1, j-1) + 1 else 0
      loop(
        if (currentLength != 0) bestLengths + ((i, j) -> currentLength) else bestLengths, 
        if (currentLength > bestLengths(bestIndices)) (i, j) else bestIndices, 
        if (j == b.length) i + 1 else i,
        if (j == b.length) 1 else j + 1)
    }
  }

  if (b.isEmpty) ""
  else loop(Map.empty[(Int, Int), Int].withDefaultValue(0), (0, 0), 1, 1)
}
Run Code Online (Sandbox Code Playgroud)

  • @chaotic3quilibrium 我已经修复了 `longestCommonSubstring("A", "")` 错误,谢谢您的注意。这个解决方案是二次的“O(n*m)”,而你的解决方案是三次的“O(n^2*m)”,因此对于 2 个大字符串(大小为 10000+),这个解决方案要快得多。在我上次编辑之前它比较慢,但我刚刚添加了一个优化以避免在 Map 中存储 0 值,所以现在它非常快。尝试自己测量一下:在我的机器上,对于 2 个大小为 11000 的字符串和大小为 1000 的普通字符串,该解决方案在 11 秒内运行,而您的则在 320 秒内运行。 (2认同)

cha*_*ium 5

旁注:这是我对这个问题的第三个答案,因为 StackOverflow 政策不允许从根本上替换先前答案的内容。感谢@Kolmar 的反馈,这个新答案比我之前的答案性能要高得多。


LCS(最长公共子串)问题空间投入了大量时间来寻找最佳解决策略。要观察更一般的计算机科学问题和最优策略,请查看这篇维基百科文章。这篇维基百科文章的下面是一些描述实现策略的伪代码

基于维基百科文章伪代码,我将提出几种不同的解决方案。目的是允许人们复制/粘贴所需的特定解决方案,而无需进行大量重构:

  1. LCSubstr:将维基百科文章伪代码翻译成 Scala ,使用命令式可变风格
  2. LCSubstrFp:重构LCSubstr为惯用的Scala 函数式不可变风格
  3. longestCommonSubstrings:重构LCSubstrFp以使用描述性名称(例如left&right而不是sand t)并跳过在中存储零长度Map
  4. longestCommonSubstringsFast:重构longestCommonSubstrings以深度优化CPU和内存
  5. longestCommonSubstringsWithIndexes:重构longestCommonSubstringsFast以通过将每个条目扩展为一个元组来增强返回值,该元组(String, (Int, Int))包括找到的子字符串和String找到子字符串的每个输入中的索引(注意:String如果出现相同的情况,这将创建索引对的组合扩展不止一次)
  6. firstLongestCommonSubstring:注重效率的版本longestCommonSubstringsFast提供了在只关心第一个濒海战斗舰并且想忽略其他相同规模的濒海战斗舰时提前终止的机会。
  7. BONUS: longestCommonSubstringsUltimate:重构以longestCommonSubstringsFast添加内部实现可变性,同时在外部保留函数的引用透明度。

对OP请求的更直接的回答将落在LCSubstrFp和之间longestCommonSubstringsFastLCSubstrFp是最直接的,但效率相当低。使用longestCommonSubstringsFast效率显着提高,因为最终使用的 CPU 和 GC 少得多。如果函数实现中包含和约束的内部可变性是可以接受的,那么这longestCommonSubstringsUltimate是迄今为止 CPU 负担和内存占用最小的版本。


LC子字符串

将Wikipedia 文章伪代码翻译成 Scala ,使用命令式可变风格

目的是尽可能接近 Scala 重现一对一的实现。例如,Scala 假设 String 是从零开始的索引,而伪代码显然使用从一开始的索引,这需要进行多次调整。

def LCSubstr(s: String, t: String): scala.collection.mutable.Set[String] =
  if (s.nonEmpty && t.nonEmpty) {
    val l: scala.collection.mutable.Map[(Int, Int), Int] = scala.collection.mutable.Map.empty
    var z: Int = 0
    var ret: scala.collection.mutable.Set[String] = scala.collection.mutable.Set.empty

    (0 until s.length).foreach {
      i =>
        (0 until t.length).foreach {
          j =>
            if (s(i) == t(j)) {
              if ((i == 0) || (j == 0))
                l += ((i, j) -> 1)
              else
                l += ((i, j) -> (l((i - 1, j - 1)) + 1))
              if (l((i, j)) > z) {
                z = l((i, j))
                ret = scala.collection.mutable.Set(s.substring(i - z + 1, i + 1))
              }
              else
                if (l((i, j)) == z)
                  ret += s.substring(i - z + 1, i + 1)
            } else
              l += ((i, j) -> 0)
        }
    }

    ret
  }
  else scala.collection.mutable.Set.empty
Run Code Online (Sandbox Code Playgroud)

LCSubstrFp

重构LCSubstr为惯用的Scala 函数式不可变风格

所有命令式和突变代码都已替换为函数和不可变代码。这两个for循环已被递归替换。

def LCSubstrFp(s: String, t: String): Set[String] =
  if (s.nonEmpty && t.nonEmpty) {
    @scala.annotation.tailrec
    def recursive(
      i: Int = 0,
      j: Int = 0,
      z: Int = 0,
      l: Map[(Int, Int), Int] = Map.empty,
      ret: Set[String] = Set.empty
    ): Set[String] =
      if (i < s.length) {
        val (newI, newJ) =
          if (j < t.length - 1) (i, j + 1)
          else (i + 1, 0)
        val lij =
          if (s(i) != t(j)) 0
          else
            if ((i == 0) || (j == 0)) 1
            else l((i - 1, j - 1)) + 1
        recursive(
          newI,
          newJ,
          if (lij > z) lij
          else z,
          l + ((i, j) -> lij),
          if (lij > z) Set(s.substring(i - lij + 1, i + 1))
          else
            if ((lij == z) && (z > 0)) ret + s.substring(i - lij + 1, i + 1)
            else ret
        )
      }
      else ret

    recursive()
  }
  else Set.empty
Run Code Online (Sandbox Code Playgroud)

最长公共子串

重构LCSubstrFp以使用描述性名称(例如left&right而不是sand t)并跳过在Map.

除了增强可读性之外,这种重构还无需存储零长度值,从而lengthByIndexLongerAndIndexShorter大大减少了“内存搅动”量。再次对函数式风格进行调整,返回值也得到了增强,Set通过将 包装SetOption. 如果返回的值是 a Some,则包含的内容Set将始终包含至少一项。

def longestCommonSubstrings(left: String, right: String): Option[Set[String]] =
  if (left.nonEmpty && right.nonEmpty) {
    val (shorter, longer) =
      if (left.length < right.length) (left, right)
      else (right, left)

    @scala.annotation.tailrec
    def recursive(
      indexLonger: Int = 0,
      indexShorter: Int = 0,
      currentLongestLength: Int = 0,
      lengthByIndexLongerAndIndexShorter: Map[(Int, Int), Int] = Map.empty,
      accumulator: List[Int] = Nil
    ): (Int, List[Int]) =
      if (indexLonger < longer.length) {
        val length =
          if (longer(indexLonger) != shorter(indexShorter)) 0
          else
            if ((indexShorter == 0) || (indexLonger == 0)) 1
            else lengthByIndexLongerAndIndexShorter.getOrElse((indexLonger - 1, indexShorter - 1), 0) + 1
        val newCurrentLongestLength =
          if (length > currentLongestLength) length
          else currentLongestLength
        val newLengthByIndexLongerAndIndexShorter =
          if (length > 0) lengthByIndexLongerAndIndexShorter + ((indexLonger, indexShorter) -> length)
          else lengthByIndexLongerAndIndexShorter
        val newAccumulator =
          if ((length < currentLongestLength) || (length == 0)) accumulator
          else {
            val entry = indexShorter - length + 1
            if (length > currentLongestLength) List(entry)
            else entry :: accumulator
          }
        if (indexShorter < shorter.length - 1)
          recursive(
            indexLonger,
            indexShorter + 1,
            newCurrentLongestLength,
            newLengthByIndexLongerAndIndexShorter,
            newAccumulator
          )
        else
          recursive(
            indexLonger + 1,
            0,
            newCurrentLongestLength,
            newLengthByIndexLongerAndIndexShorter,
            newAccumulator
          )
      }
      else (currentLongestLength, accumulator)

    val (length, indexShorters) = recursive()
    if (indexShorters.nonEmpty)
      Some(
        indexShorters
          .map {
            indexShorter =>
              shorter.substring(indexShorter, indexShorter + length)
          }
          .toSet
      )
    else None
  }
  else None
Run Code Online (Sandbox Code Playgroud)

最长公共子串快

重构longestCommonSubstrings以深度优化 CPU 和内存。

在保持功能性和不变性的同时消除每一点低效率,执行速度提高了数倍longestCommonSubstrings。大部分成本降低是通过用一对仅跟踪当前行和先前行的 s替换Map整个矩阵的s 来实现的。List

要轻松查看差异longestCommonSubstrings请查看此视觉差异

def longestCommonSubstringsFast(left: String, right: String): Option[Set[String]] =
  if (left.nonEmpty && right.nonEmpty) {
    val (shorter, longer) =
      if (left.length < right.length) (left, right)
      else (right, left)

    @scala.annotation.tailrec
    def recursive(
      indexLonger: Int = 0,
      indexShorter: Int = 0,
      currentLongestLength: Int = 0,
      lengthsPrior: List[Int] = List.fill(shorter.length)(0),
      lengths: List[Int] = Nil,
      accumulator: List[Int] = Nil
    ): (Int, List[Int]) =
      if (indexLonger < longer.length) {
        val length =
          if (longer(indexLonger) != shorter(indexShorter)) 0
          else lengthsPrior.head + 1
        val newCurrentLongestLength =
          if (length > currentLongestLength) length
          else currentLongestLength
        val newAccumulator =
          if ((length < currentLongestLength) || (length == 0)) accumulator
          else {
            val entry = indexShorter - length + 1
            if (length > currentLongestLength) List(entry)
            else entry :: accumulator
          }
        if (indexShorter < shorter.length - 1)
          recursive(
            indexLonger,
            indexShorter + 1,
            newCurrentLongestLength,
            lengthsPrior.tail,
            length :: lengths,
            newAccumulator
          )
        else
          recursive(
            indexLonger + 1,
            0,
            newCurrentLongestLength,
            0 :: lengths.reverse,
            Nil,
            newAccumulator
          )
      }
      else (currentLongestLength, accumulator)

    val (length, indexShorters) = recursive()
    if (indexShorters.nonEmpty)
      Some(
        indexShorters
          .map {
            indexShorter =>
              shorter.substring(indexShorter, indexShorter + length)
          }
          .toSet
      )
    else None
  }
  else None
Run Code Online (Sandbox Code Playgroud)

带索引的最长公共子串

重构以longestCommonSubstringsFast通过将每个条目扩展为一个元组来增强返回值,(String, (Int, Int))该元组包括找到的子字符串以及String找到子字符串的每个输入中的索引。

注意:如果索引对String出现多次,这将创建索引对的组合扩展。

再次对函数式风格进行调整,返回值也得到了增强,List通过将 包装ListOption. 如果返回的值是 a Some,则包含的内容List将始终包含至少一项。

def longestCommonSubstringsWithIndexes(left: String, right: String): Option[List[(String, (Int, Int))]] =
  if (left.nonEmpty && right.nonEmpty) {
    val isLeftShorter = left.length < right.length
    val (shorter, longer) =
      if (isLeftShorter) (left, right)
      else (right, left)

    @scala.annotation.tailrec
    def recursive(
      indexLonger: Int = 0,
      indexShorter: Int = 0,
      currentLongestLength: Int = 0,
      lengthsPrior: List[Int] = List.fill(shorter.length)(0),
      lengths: List[Int] = Nil,
      accumulator: List[(Int, Int)] = Nil
    ): (Int, List[(Int, Int)]) =
      if (indexLonger < longer.length) {
        val length =
          if (longer(indexLonger) != shorter(indexShorter)) 0
          else lengthsPrior.head + 1
        val newCurrentLongestLength =
          if (length > currentLongestLength) length
          else currentLongestLength
        val newAccumulator =
          if ((length < currentLongestLength) || (length == 0)) accumulator
          else {
            val entry = (indexLonger - length + 1, indexShorter - length + 1)
            if (length > currentLongestLength) List(entry)
            else entry :: accumulator
          }
        if (indexShorter < shorter.length - 1)
          recursive(
            indexLonger,
            indexShorter + 1,
            newCurrentLongestLength,
            lengthsPrior.tail,
            length :: lengths,
            newAccumulator
          )
        else
          recursive(
            indexLonger + 1,
            0,
            newCurrentLongestLength,
            0 :: lengths.reverse,
            Nil,
            newAccumulator
          )
      }
      else (currentLongestLength, accumulator)

    val (length, indexPairs) = recursive()
    if (indexPairs.nonEmpty)
      Some(
        indexPairs
          .reverse
          .map {
            indexPair =>
              (
                longer.substring(indexPair._1, indexPair._1 + length),
                if (isLeftShorter) indexPair.swap else indexPair
              )
          }
      )
    else None
  }
  else None
Run Code Online (Sandbox Code Playgroud)

第一个最长公共子串

longestCommonSubstringsFast当只关心第一个濒海战斗舰并且想忽略其他相同规模的濒海战斗舰时,其注重效率的版本提供了提前终止的机会。

def firstLongestCommonSubstring(left: String, right: String): Option[(String, (Int, Int))] =
  if (left.nonEmpty && right.nonEmpty) {
    val isLeftShorter = left.length < right.length
    val (shorter, longer) =
      if (isLeftShorter) (left, right)
      else (right, left)

    @scala.annotation.tailrec
    def recursive(
      indexLonger: Int = 0,
      indexShorter: Int = 0,
      currentLongestLength: Int = 0,
      lengthsPrior: List[Int] = List.fill(shorter.length)(0),
      lengths: List[Int] = Nil,
      accumulator: Option[(Int, Int)] = None
    ): Option[(Int, (Int, Int))] =
      if (indexLonger < longer.length) {
        val length =
          if (longer(indexLonger) != shorter(indexShorter)) 0
          else lengthsPrior.head + 1
        val newAccumulator =
          if (length > currentLongestLength) Some((indexLonger - length + 1, indexShorter - length + 1))
          else accumulator
        if (length < shorter.length) {
          val newCurrentLongestLength =
            if (length > currentLongestLength) length
            else currentLongestLength
          if (indexShorter < shorter.length - 1)
            recursive(
              indexLonger,
              indexShorter + 1,
              newCurrentLongestLength,
              lengthsPrior.tail,
              length :: lengths,
              newAccumulator
            )
          else
            recursive(
              indexLonger + 1,
              0,
              newCurrentLongestLength,
              0 :: lengths.reverse,
              Nil,
              newAccumulator
            )
        }
        else
          recursive(longer.length, 0, length, lengthsPrior, lengths, newAccumulator) //early terminate
      }
      else accumulator.map((currentLongestLength, _))

    recursive().map {
      case (length, indexPair) =>
        (
          longer.substring(indexPair._1, indexPair._1 + length),
          if (isLeftShorter) indexPair.swap else indexPair
        )
    }
  }
  else None
Run Code Online (Sandbox Code Playgroud)

奖金:

最长公共子串终极

重构longestCommonSubstringsFast以添加内部实现可变性,同时在外部保留函数的引用透明度。

进一步消除每一点低效率,同时保持功能性和引用透明性(在实现本身中利用可变性,有些人认为这不是有效的功能),执行速度几乎提高了三倍以上longestCommonSubstringsFast。大部分成本降低是通过将一对Lists 替换为单个 s 来实现Array的。

要轻松查看差异longestCommonSubstringsFast请查看此视觉差异

def longestCommonSubstringsUltimate(left: String, right: String): Option[Set[String]] =
  if (left.nonEmpty && right.nonEmpty) {
    val (shorter, longer) =
      if (left.length < right.length) (left, right)
      else (right, left)
    val lengths: Array[Int] = new Array(shorter.length) //mutable

    @scala.annotation.tailrec
    def recursive(
      indexLonger: Int = 0,
      indexShorter: Int = 0,
      currentLongestLength: Int = 0,
      lastIterationLength: Int = 0,
      accumulator: List[Int] = Nil
    ): (Int, List[Int]) =
      if (indexLonger < longer.length) {
        val length =
          if (longer(indexLonger) != shorter(indexShorter)) 0
          else
            if (indexShorter == 0) 1
            else lastIterationLength + 1
        val newLastIterationLength = lengths(indexShorter)
        lengths(indexShorter) = length //mutation
        val newCurrentLongestLength =
          if (length > currentLongestLength) length
          else currentLongestLength
        val newAccumulator =
          if ((length < currentLongestLength) || (length == 0)) accumulator
          else {
            val entry = indexShorter - length + 1
            if (length > currentLongestLength) List(entry)
            else entry :: accumulator
          }
        if (indexShorter < shorter.length - 1)
          recursive(
            indexLonger,
            indexShorter + 1,
            newCurrentLongestLength,
            newLastIterationLength,
            newAccumulator
          )
        else
          recursive(
            indexLonger + 1,
            0,
            newCurrentLongestLength,
            newLastIterationLength,
            newAccumulator
          )
      }
      else (currentLongestLength, accumulator)

    val (length, indexShorters) = recursive()
    if (indexShorters.nonEmpty)
      Some(
        indexShorters
          .map {
            indexShorter =>
              shorter.substring(indexShorter, indexShorter + length)
          }
          .toSet
      )
    else None
  }
  else None
Run Code Online (Sandbox Code Playgroud)