Akka流:读取多个文件

ran*_*tic 5 scala akka akka-stream

我有一个文件列表.我想要:

  1. 从所有这些作为单一来源阅读.
  2. 文件应按顺序依次读取.(没有循环)
  3. 在任何时候都不应该要求任何文件完全在内存中.
  4. 从文件读取错误应该会折叠流.

感觉这应该有效:(Scala,akka-streams v2.4.7)

val sources = Seq("file1", "file2").map(new File(_)).map(f => FileIO.fromPath(f.toPath)
    .via(Framing.delimiter(ByteString(System.lineSeparator), 10000, allowTruncation = true))
    .map(bs => bs.utf8String)
  )
val source = sources.reduce( (a, b) => Source.combine(a, b)(MergePreferred(_)) )
source.map(_ => 1).runWith(Sink.reduce[Int](_ + _)) // counting lines
Run Code Online (Sandbox Code Playgroud)

但是这会导致编译错误,因为它FileIO具有与之关联的物化值,并且Source.combine不支持它.

映射物化值让我想知道如何处理文件读取错误,但是编译:

val sources = Seq("file1", "file2").map(new File(_)).map(f => FileIO.fromPath(f.toPath)
    .via(Framing.delimiter(ByteString(System.lineSeparator), 10000, allowTruncation = true))
    .map(bs => bs.utf8String)
    .mapMaterializedValue(f => NotUsed.getInstance())
  )
val source = sources.reduce( (a, b) => Source.combine(a, b)(MergePreferred(_)) )
source.map(_ => 1).runWith(Sink.reduce[Int](_ + _))  // counting lines
Run Code Online (Sandbox Code Playgroud)

但是在运行时抛出IllegalArgumentException:

java.lang.IllegalArgumentException: requirement failed: The inlets [] and outlets [MergePreferred.out] must correspond to the inlets [MergePreferred.preferred] and outlets [MergePreferred.out]
Run Code Online (Sandbox Code Playgroud)

Vik*_*ang 10

下面的代码并不像它可能的那样简洁,以便清楚地模块化不同的问题.

// Given a stream of bytestrings delimited by the system line separator we can get lines represented as Strings
val lines = Framing.delimiter(ByteString(System.lineSeparator), 10000, allowTruncation = true).map(bs => bs.utf8String)

// given as stream of Paths we read those files and count the number of lines
val lineCounter = Flow[Path].flatMapConcat(path => FileIO.fromPath(path).via(lines)).fold(0l)((count, line) => count + 1).toMat(Sink.head)(Keep.right)

// Here's our test data source (replace paths with real paths)
val testFiles = Source(List("somePathToFile1", "somePathToFile2").map(new File(_).toPath))

// Runs the line counter over the test files, returns a Future, which contains the number of lines, which we then print out to the console when it completes
testFiles runWith lineCounter foreach println
Run Code Online (Sandbox Code Playgroud)