Scala和Clojure中的简单字符串模板替换

Ral*_*lph 9 string scala clojure

下面是用Scala和Clojure编写的函数,用于简单替换字符串中的模板.每个函数的输入String包含表单模板{key}和从符号/关键字到替换值的映射.

例如:

斯卡拉:

replaceTemplates("This is a {test}", Map('test -> "game"))
Run Code Online (Sandbox Code Playgroud)

Clojure的:

(replace-templates "This is a {test}" {:test "game"})
Run Code Online (Sandbox Code Playgroud)

会回来的"This is a game".

输入映射使用符号/关键字,这样我就不必处理字符串中的模板包含大括号的极端情况.

不幸的是,该算法效率不高.

这是Scala代码:

def replaceTemplates(text: String,
                     templates: Map[Symbol, String]): String = {
  val builder = new StringBuilder(text)

  @tailrec
  def loop(key: String,
           keyLength: Int,
           value: String): StringBuilder = {
    val index = builder.lastIndexOf(key)
    if (index < 0) builder
    else {
      builder.replace(index, index + keyLength, value)
      loop(key, keyLength, value)
    }
  }

  templates.foreach {
    case (key, value) =>
      val template = "{" + key.name + "}"
      loop(template, template.length, value)
  }

  builder.toString
}
Run Code Online (Sandbox Code Playgroud)

这是Clojure代码:

(defn replace-templates
  "Return a String with each occurrence of a substring of the form {key}
   replaced with the corresponding value from a map parameter.
   @param str the String in which to do the replacements
   @param m a map of keyword->value"
  [text m]
  (let [sb (StringBuilder. text)]
    (letfn [(replace-all [key key-length value]
              (let [index (.lastIndexOf sb key)]
                (if (< index 0)
                  sb
                  (do
                    (.replace sb index (+ index key-length) value)
                    (recur key key-length value)))))]
      (doseq [[key value] m]
        (let [template (str "{" (name key) "}")]
          (replace-all template (count template) value))))
    (.toString sb)))
Run Code Online (Sandbox Code Playgroud)

这是一个测试用例(Scala代码):

replaceTemplates("""
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque
elit nisi, egestas et tincidunt eget, {foo} mattis non erat. Aenean ut
elit in odio vehicula facilisis. Vestibulum quis elit vel nulla
interdum facilisis ut eu sapien. Nullam cursus fermentum
sollicitudin. Donec non congue augue. {bar} Vestibulum et magna quis
arcu ultricies consectetur auctor vitae urna. Fusce hendrerit
facilisis volutpat. Ut lectus augue, mattis {baz} venenatis {foo}
lobortis sed, varius eu massa. Ut sit amet nunc quis velit hendrerit
bibendum in eget nibh. Cras blandit nibh in odio suscipit eget aliquet
tortor placerat. In tempor ullamcorper mi. Quisque egestas, metus eu
venenatis pulvinar, sem urna blandit mi, in lobortis augue sem ut
dolor. Sed in {bar} neque sapien, vitae lacinia arcu. Phasellus mollis
blandit commodo.
""", Map('foo -> "HELLO", 'bar -> "GOODBYE", 'baz -> "FORTY-TWO"))
Run Code Online (Sandbox Code Playgroud)

和输出:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque
elit nisi, egestas et tincidunt eget, HELLO mattis non erat. Aenean ut
elit in odio vehicula facilisis. Vestibulum quis elit vel nulla
interdum facilisis ut eu sapien. Nullam cursus fermentum
sollicitudin. Donec non congue augue. GOODBYE Vestibulum et magna quis
arcu ultricies consectetur auctor vitae urna. Fusce hendrerit
facilisis volutpat. Ut lectus augue, mattis FORTY-TWO venenatis HELLO
lobortis sed, varius eu massa. Ut sit amet nunc quis velit hendrerit
bibendum in eget nibh. Cras blandit nibh in odio suscipit eget aliquet
tortor placerat. In tempor ullamcorper mi. Quisque egestas, metus eu
venenatis pulvinar, sem urna blandit mi, in lobortis augue sem ut
dolor. Sed in GOODBYE neque sapien, vitae lacinia arcu. Phasellus mollis
blandit commodo.
Run Code Online (Sandbox Code Playgroud)

该算法横切输入映射,对于每对,在输入中进行替换String,暂时保存在StringBuilder.对于每个键/值对,我们搜索键的最后一次出现(括在括号中)并将其替换为值,直到不再出现.

如果我们在StringBuilder中使用.lastIndexOfvs .indexOf,它会产生任何性能差异吗?

如何改进算法?是否有更惯用的方式来编写Scala和/或Clojure代码?

更新:看我的后续行动.

更新2:这是一个更好的Scala实现; 字符串长度为O(n).请注意,我修改了Map[String, String]而不是[Symbol, String]几个人的建议.(感谢迈克拉,科塔拉克):

/**
 * Replace templates of the form {key} in the input String with values from the Map.
 *
 * @param text the String in which to do the replacements
 * @param templates a Map from Symbol (key) to value
 * @returns the String with all occurrences of the templates replaced by their values
 */
def replaceTemplates(text: String,
                     templates: Map[String, String]): String = {
  val builder = new StringBuilder
  val textLength = text.length

  @tailrec
  def loop(text: String): String = {
    if (text.length == 0) builder.toString
    else if (text.startsWith("{")) {
      val brace = text.indexOf("}")
      if (brace < 0) builder.append(text).toString
      else {
        val replacement = templates.get(text.substring(1, brace)).orNull
          if (replacement != null) {
            builder.append(replacement)
            loop(text.substring(brace + 1))
          } else {
            builder.append("{")
            loop(text.substring(1))
          }
      }
    } else {
      val brace = text.indexOf("{")
      if (brace < 0) builder.append(text).toString
      else {
        builder.append(text.substring(0, brace))
        loop(text.substring(brace))
      }
    }
  }

  loop(text)
}
Run Code Online (Sandbox Code Playgroud)

更新3:这是一组Clojure测试用例(Scala版本留作练习:-)):

(use 'clojure.test)

(deftest test-replace-templates
  (is (=        ; No templates
        (replace-templates "this is a test" {:foo "FOO"})
        "this is a test"))

  (is (=        ; One simple template
        (replace-templates "this is a {foo} test" {:foo "FOO"})
        "this is a FOO test"))

  (is (=        ; Two templates, second at end of input string
        (replace-templates "this is a {foo} test {bar}" {:foo "FOO" :bar "BAR"})
        "this is a FOO test BAR"))

  (is (=        ; Two templates
        (replace-templates "this is a {foo} test {bar} 42" {:foo "FOO" :bar "BAR"})
        "this is a FOO test BAR 42"))

  (is (=        ; Second brace-enclosed item is NOT a template
        (replace-templates "this is a {foo} test {baz} 42" {:foo "FOO" :bar "BAR"})
        "this is a FOO test {baz} 42"))

  (is (=        ; Second item is not a template (no closing brace)
        (replace-templates "this is a {foo} test {bar" {:foo "FOO" :bar "BAR"})
        "this is a FOO test {bar"))

  (is (=        ; First item is enclosed in a non-template brace-pair
        (replace-templates "this is {a {foo} test} {bar" {:foo "FOO" :bar "BAR"})
        "this is {a FOO test} {bar")))

(run-tests)
Run Code Online (Sandbox Code Playgroud)

mik*_*era 8

我认为你可以构建的最好的算法是输入字符串长度为O(n),它会像:

  1. 初始化一个空的StringBuilder
  2. 扫描字符串以找到第一个"{",在此之前将任何子字符串添加到Stringbuilder中.如果没有找到"{",那么你已经完成了!
  3. 扫描到下一个"}".使用花括号之间的任何内容在String-> String hashmap中执行地图查找,并将结果添加到StringBuilder
  4. 返回2.并继续扫描"}"之后

转换为Scala/Clojure作为练习:-)


Tor*_*ørn 7

这是使用正则表达式进行替换的clojure实现的一个版本.它比你的版本更快(运行你的Lorum ipsum测试用例100次,进一步查看),并且维护的代码更少:

(defn replace-templates2 [text m]
  (clojure.string/replace text 
                          #"\{\w+\}" 
                          (fn [groups] 
                              ((keyword (subs groups 
                                              1 
                                              (dec (.length groups)))) m))))
Run Code Online (Sandbox Code Playgroud)

实现快速而肮脏,但它的工作原理.关键是我认为你应该使用正则表达式来解决这个问题.


更新:

用一种时髦的方式进行实验,以进行子串,并获得了令人惊讶的性能结果.这是代码:

(defn replace-templates3 [text m]
  (clojure.string/replace text 
                          #"\{\w+\}" 
                          (fn [groups] 
                              ((->> groups
                                    reverse
                                    (drop 1)
                                    reverse
                                    (drop 1)
                                    (apply str)
                                    keyword) m))))
Run Code Online (Sandbox Code Playgroud)

以下是我的机器上的版本,我的第一个版本以及最终版本(100次迭代)的结果:

"Elapsed time: 77.475072 msecs"
"Elapsed time: 50.238911 msecs"
"Elapsed time: 38.109875 msecs"
Run Code Online (Sandbox Code Playgroud)


cem*_*ick 7

我为Clojure写了一个字符串插值库,它被带入了clojure-contrib as clojure.contrib.strint.我写了博客 ; 你会在那里找到对这种方法的描述.可以在github查看最新的源代码.clojure.contrib.strint这里和方法之间的巨大差异是后者都在运行时执行插值.根据我的经验,运行时插值在很大程度上是不必要的,使用类似的东西clojure.contrib.strint在编译时执行插值通常会为您的应用程序带来切实的性能优势.

请注意,clojure.contrib.strint有望迁移到clojure.core.strintClojure的"new-contrib"组织.


Dan*_*ral 6

有些人在遇到问题时会想"我会使用正则表达式!".现在他们有两个问题.然而,其他人决定使用正则表达式 - 现在他们有三个问题:实现和维护半正则表达式的临时实现,以及其他两个.

无论如何,请考虑这个:

import scala.util.matching.Regex

def replaceTemplates(text: String,
                     templates: Map[String, String]): String = 
    """\{([^{}]*)\}""".r replaceSomeIn ( text,  { case Regex.Groups(name) => templates get name } )
Run Code Online (Sandbox Code Playgroud)

它使用字符串构建器来搜索和替换.地图正在使用String而不是Symbol因为它更快,并且代码不会替换没有有效映射的匹配.使用replaceAllIn会避免这种情况,但需要一些类型注释,因为该方法已经过载.

您可能希望从scaladoc API中浏览Scala的源代码Regex,并查看正在发生的事情.


clo*_*man 6

Torbjørns的答案非常好且易读.使用butlast摆脱双反转,以及字符串/连接而不是apply'ing str可能会很好.另外使用地图作为功能.因此,clojure代码可以进一步缩短为:

(defn replace-template [text m] 
      (clojure.string/replace text #"\{\w+\}" 
                              (comp m keyword clojure.string/join butlast rest)))
Run Code Online (Sandbox Code Playgroud)