将Scala中的CSV读入具有错误处理的案例类实例

Question

将Scala中的CSV读入具有错误处理的案例类实例

我想在Scala中读取一个CSV字符串/文件,以便给定一个case类C和一个错误类型Error,解析器填充一个Iterable[Either[Error,C]].是否有任何库可以执行此操作或类似的操作？

例如,给定一个类和错误

case class Person(name: String, age: Int)

type Error = String

Run Code Online (Sandbox Code Playgroud)

和CSV字符串

Foo,19
Ro
Bar,24

Run Code Online (Sandbox Code Playgroud)

解析器会输出

Stream(Right(Person("Foo",1)), Left("Cannot read 'Ro'"), Right(Person("Bar", 24)))

Run Code Online (Sandbox Code Playgroud)

更新:

我认为我的问题不明确,所以让我澄清一下:有没有办法在Scala中读取CSV而不定义样板？鉴于任何案例类,有没有办法自动加载它？我想以这种方式使用它:

val iter = csvParserFor[Person].parseLines(lines)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Tra*_*own 19

这是一个无形实现,采用与您提出的示例中的方法略有不同的方法.这是基于我过去编写的一些代码,与您的实现的主要区别在于,这个代码更为通用 - 例如,实际的CSV解析部分已被考虑在内,因此可以轻松使用专用库.

首先是一个通用Read类型类(还没有Shapeless):

import scala.util.{ Failure, Success, Try }

trait Read[A] { def reads(s: String): Try[A] }

object Read {
  def apply[A](implicit readA: Read[A]): Read[A] = readA

  implicit object stringRead extends Read[String] {
    def reads(s: String): Try[String] = Success(s)
  }

  implicit object intRead extends Read[Int] {
    def reads(s: String) = Try(s.toInt)
  }

  // And so on...
}

Run Code Online (Sandbox Code Playgroud)

然后是有趣的部分:一个类型类,它提供从字符串列表到HList:的转换(可能失败):

import shapeless._

trait FromRow[L <: HList] { def apply(row: List[String]): Try[L] }

object FromRow {
  import HList.ListCompat._

  def apply[L <: HList](implicit fromRow: FromRow[L]): FromRow[L] = fromRow

  def fromFunc[L <: HList](f: List[String] => Try[L]) = new FromRow[L] {
    def apply(row: List[String]) = f(row)
  }

  implicit val hnilFromRow: FromRow[HNil] = fromFunc {
    case Nil => Success(HNil)
    case _ => Failure(new RuntimeException("No more rows expected"))
  }

  implicit def hconsFromRow[H: Read, T <: HList: FromRow]: FromRow[H :: T] =
    fromFunc {
      case h :: t => for {
        hv <- Read[H].reads(h)
        tv <- FromRow[T].apply(t)
      } yield hv :: tv
      case Nil => Failure(new RuntimeException("Expected more cells"))
    }
}

Run Code Online (Sandbox Code Playgroud)

最后使其适用于案例类:

trait RowParser[A] {
  def apply[L <: HList](row: List[String])(implicit
    gen: Generic.Aux[A, L],
    fromRow: FromRow[L]
  ): Try[A] = fromRow(row).map(gen. from)
}

def rowParserFor[A] = new RowParser[A] {}

Run Code Online (Sandbox Code Playgroud)

现在我们可以编写以下内容,例如,使用OpenCSV:

case class Foo(s: String, i: Int)

import au.com.bytecode.opencsv._
import scala.collection.JavaConverters._

val reader = new CSVReader(new java.io.FileReader("foos.csv"))

val foos = reader.readAll.asScala.map(row => rowParserFor[Foo](row.toList))

Run Code Online (Sandbox Code Playgroud)

如果我们有这样的输入文件:

first,10
second,11
third,twelve

Run Code Online (Sandbox Code Playgroud)

我们将得到以下内容:

scala> foos.foreach(println)
Success(Foo(first,10))
Success(Foo(second,11))
Failure(java.lang.NumberFormatException: For input string: "twelve")

Run Code Online (Sandbox Code Playgroud)

(请注意,这会让每个行都出现问题Generic和FromRow实例,但如果性能受到关注,那么很容易改变它.)

Answer 2

Nic*_*udo 13

kantan.csv看起来像你想要的.如果你想要0样板,你可以使用它的无形模块并写:

import kantan.csv.ops._
import kantan.csv.generic.codecs._

new File("path/to/csv").asCsvRows[Person](',', false).toList

Run Code Online (Sandbox Code Playgroud)

根据您的意见,这将产生:

res2: List[kantan.csv.DecodeResult[Person]] = List(Success(Person(Foo,19)), DecodeFailure, Success(Person(Bar,24)))

Run Code Online (Sandbox Code Playgroud)

请注意,实际的返回类型是一个迭代器,因此您实际上不必像在您的示例中那样将整个CSV文件保存在内存中Stream.

如果无形依赖关系太多,您可以删除它并提供自己的case类类型,其中包含最少的样板:

implicit val personCodec = RowCodec.caseCodec2(Person.apply, Person.unapply)(0, 1)

Run Code Online (Sandbox Code Playgroud)

完全披露:我是kantan.csv的作者.

Answer 3

Xav*_*hot 5

开始Scala 2.13，可以String通过不应用字符串插值器来模式匹配s ：

// case class Person(name: String, age: Int)
val csv = "Foo,19\nRo\nBar,24".split("\n")
csv.map {
  case s"$name,$age" => Right(Person(name, age.toInt))
  case line          => Left(s"Cannot read '$line'")
}
// Array(Right(Person("Foo", 19)), Left("Cannot read 'Ro'"), Right(Person("Bar", 24)))

Run Code Online (Sandbox Code Playgroud)

请注意，您也可以regex在提取器中使用es。

如果年龄不是整数，则在我们的情况下考虑行无效可能会有所帮助：

// val csv = "Foo,19\nRo\nBar,2R".split("\n")

val Age = "(\\d+)".r

csv.map {
  case s"$name,${Age(age)}" => Right(Person(name, age.toInt))
  case line @ s"$name,$age" => Left(s"Age is not an integer in '$line'")
  case line                 => Left(s"Cannot read '$line'")
}
//Array(Right(Person("Foo", 19)), Left("Cannot read 'Ro'"), Left("Age is not an integer in 'Bar,2R'"))

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，9 月前
查看次数：	7363 次
最近记录：	6 年，6 月前