使用circe递归将JSON树转换为其他格式(XML,CSV等)

Urs*_*ter 3 xml json scala scala-cats circe

为了使用circe 将JSON节点转换为JSON 以外的其他格式(例如XML,CSV等),我想出了一个解决方案,其中我必须访问circe的内部数据结构。

这是我的工作示例,该示例将JSON转换为XML字符串(虽然不完美,但您可以理解):

package io.circe

import io.circe.Json.{JArray, JBoolean, JNull, JNumber, JObject, JString}
import io.circe.parser.parse

object Sample extends App {

  def transformToXMLString(js: Json): String = js match {
    case JNull => ""
    case JBoolean(b) => b.toString
    case JNumber(n) => n.toString
    case JString(s) => s.toString
    case JArray(a) => a.map(transformToXMLString(_)).mkString("")
    case JObject(o) => o.toMap.map {
      case (k, v) => s"<${k}>${transformToXMLString(v)}</${k}>"
    }.mkString("")
  }

  val json =
    """{
      | "root": {
      |  "sampleboolean": true,
      |  "sampleobj": {
      |    "anInt": 1,
      |    "aString": "string"
      |  },
      |  "objarray": [
      |     {"v1": 1},
      |     {"v2": 2}
      |  ]
      | }
      |}""".stripMargin

  val res = transformToXMLString(parse(json).right.get)
  println(res)
}
Run Code Online (Sandbox Code Playgroud)

结果是:

<root><sampleboolean>true</sampleboolean><sampleobj><anInt>1</anInt><aString>string</aString></sampleobj><objarray><v1>1</v1><v2>2</v2></objarray></root>
Run Code Online (Sandbox Code Playgroud)

如果低级JSON对象(如JBoolean, JString, JObject等)在circe 中不是私有包,那么这一切都很好,但如果将其放在package中,则只能使上面的代码起作用package io.circe

使用public circe API如何获得与上述相同的结果?

Tra*_*own 5

The fold method on Json allows you to perform this kind of operation quite concisely (and in a way that enforces exhaustivity, just like pattern matching on a sealed trait):

import io.circe.Json

def transformToXMLString(js: Json): String = js.fold(
  "",
  _.toString,
  _.toString,
  identity,
  _.map(transformToXMLString(_)).mkString(""),
  _.toMap.map {
    case (k, v) => s"<${k}>${transformToXMLString(v)}</${k}>"
  }.mkString("")
)
Run Code Online (Sandbox Code Playgroud)

And then:

scala> import io.circe.parser.parse
import io.circe.parser.parse

scala> transformToXMLString(parse(json).right.get)
res1: String = <root><sampleboolean>true</sampleboolean><sampleobj><anInt>1</anInt><aString>string</aString></sampleobj><objarray><v1>1</v1><v2>2</v2></objarray></root>
Run Code Online (Sandbox Code Playgroud)

Exactly the same result as your implementation, but with a few fewer characters and no relying on private details of the implementation.

So the answer is "use fold" (or the asX methods as suggested in the other answer—that approach is more flexible but in general is likely to be less idiomatic and more verbose). If you care about why we've made the design decision in circe not to expose the constructors, you can skip to the end of this answer, but this kind of question comes up a lot, so I also want to address a few related points first.

A side note about naming

Note that the use of the name "fold" for this method is inherited from Argonaut, and is arguably inaccurate. When we talk about catamorphisms (or folds) for recursive algebraic data types, we mean a function where we don't see the ADT type in the arguments of the functions we're passing in. For example, the signature of the fold for lists looks like this:

def foldLeft[B](z: B)(op: (B, A) => B): B
Run Code Online (Sandbox Code Playgroud)

Not this:

def foldLeft[B](z: B)(op: (List[A], A) => B): B
Run Code Online (Sandbox Code Playgroud)

Since io.circe.Json is a recursive ADT, its fold method really should look like this:

def properFold[X](
  jsonNull: => X,
  jsonBoolean: Boolean => X,
  jsonNumber: JsonNumber => X,
  jsonString: String => X,
  jsonArray: Vector[X] => X,
  jsonObject: Map[String, X] => X
): X
Run Code Online (Sandbox Code Playgroud)

Instead of:

def fold[X](
  jsonNull: => X,
  jsonBoolean: Boolean => X,
  jsonNumber: JsonNumber => X,
  jsonString: String => X,
  jsonArray: Vector[Json] => X,
  jsonObject: JsonObject => X
): X
Run Code Online (Sandbox Code Playgroud)

But in practice the former seems less useful, so circe only provides the latter (if you want to recurse, you have to do it manually), and follows Argonaut in calling it fold. This has always made me a little uncomfortable, and the name may change in the future.

A side note about performance

In some cases instantiating the six functions fold expects may be prohibitively expensive, so circe also allows you to bundle the operations together:

import io.circe.{ Json, JsonNumber, JsonObject }

val xmlTransformer: Json.Folder[String] = new Json.Folder[String] {
    def onNull: String = ""
  def onBoolean(value: Boolean): String = value.toString
  def onNumber(value: JsonNumber): String = value.toString
  def onString(value: String): String = value
  def onArray(value: Vector[Json]): String =
    value.map(_.foldWith(this)).mkString("")
  def onObject(value: JsonObject): String = value.toMap.map {
    case (k, v) => s"<${k}>${transformToXMLString(v)}</${k}>"
  }.mkString("")
}
Run Code Online (Sandbox Code Playgroud)

And then:

scala> parse(json).right.get.foldWith(xmlTransformer)
res2: String = <root><sampleboolean>true</sampleboolean><sampleobj><anInt>1</anInt><aString>string</aString></sampleobj><objarray><v1>1</v1><v2>2</v2></objarray></root>
Run Code Online (Sandbox Code Playgroud)

The performance benefit from using Folder will vary depending on whether you're on 2.11 or 2.12, but if the actual operations you're performing on the JSON values are cheap, you can expect the Folder version to get about twice the throughput of fold. Incidentally it's also significantly faster than pattern matching on the internal constructors, at least in the benchmarks we've done:

Benchmark                           Mode  Cnt      Score    Error  Units
FoldingBenchmark.withFold          thrpt   10   6769.843 ± 79.005  ops/s
FoldingBenchmark.withFoldWith      thrpt   10  13316.918 ± 60.285  ops/s
FoldingBenchmark.withPatternMatch  thrpt   10   8022.192 ± 63.294  ops/s
Run Code Online (Sandbox Code Playgroud)

That's on 2.12. I believe you should see even more of a difference on 2.11.

A side note about optics

If you really want pattern matching, circe-optics gives you a high-powered alternative to case class extractors:

import io.circe.Json, io.circe.optics.all._

def transformToXMLString(js: Json): String = js match {
    case `jsonNull` => ""
  case jsonBoolean(b) => b.toString
  case jsonNumber(n) => n.toString
  case jsonString(s) => s.toString
  case jsonArray(a) => a.map(transformToXMLString(_)).mkString("")
  case jsonObject(o) => o.toMap.map {
    case (k, v) => s"<${k}>${transformToXMLString(v)}</${k}>"
  }.mkString("")
}
Run Code Online (Sandbox Code Playgroud)

This is almost exactly the same code as your original version, but each of these extractors is a Monocle prism that can be composed with other optics from the Monocle library.

(The downside of this approach is that you lose exhaustivity checking, but unfortunately that can't be helped.)

Why not just case classes

When I first started working on circe I wrote the following in a document about some of my design decisions:

In some cases, including most significantly here the io.circe.Json type, we don't want to encourage users to think of the ADT leaves as having meaningful types. A JSON value "is" a boolean or a string or a unit or a Seq[Json] or a JsonNumber or a JsonObject. Introducing types like JString, JNumber, etc. into the public API just confuses things.

我想要一个非常小的API(尤其是避免暴露那些没有意义的类型的API),并且想要腾出空间来优化JSON表示形式。(我也根本不真的希望人们使用JSON AST,但这是一场失败的战斗。)我仍然认为隐藏构造函数是正确的决定,即使我没有真正利用过他们还没有优化(尽管如此),尽管这个问题很多。