如何从Flow中创建一个递归生成值的akka​​-stream源?

Gui*_*ssé 23 scala akka-stream

我需要遍历一个形状像树的API.例如,目录结构或讨论的线程.它可以通过以下流程建模:

type ItemId = Int
type Data = String
case class Item(data: Data, kids: List[ItemId])

def randomData(): Data = scala.util.Random.alphanumeric.take(2).mkString 

// 0 => [1, 9]
// 1 => [10, 19]
// 2 => [20, 29]
// ...
// 9 => [90, 99]
// _ => []
// NB. I don't have access to this function, only the itemFlow.
def nested(id: ItemId): List[ItemId] =
  if (id == 0) (1 to 9).toList
  else if (1 <= id && id <= 9) ((id * 10) to ((id + 1) * 10 - 1)).toList
  else Nil

val itemFlow: Flow[ItemId, Item, NotUsed] = 
  Flow.fromFunction(id => Item(randomData, nested(id)))
Run Code Online (Sandbox Code Playgroud)

我该如何遍历这些数据?我得到了以下工作:

import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._

import scala.concurrent.Await
import scala.concurrent.duration.Duration

implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()

val loop = 
  GraphDSL.create() { implicit b =>
    import GraphDSL.Implicits._

    val source = b.add(Flow[Int])
    val merge  = b.add(Merge[Int](2))
    val fetch  = b.add(itemFlow) 
    val bcast  = b.add(Broadcast[Item](2))

    val kids   = b.add(Flow[Item].mapConcat(_.kids))
    val data   = b.add(Flow[Item].map(_.data))

    val buffer = Flow[Int].buffer(100, OverflowStrategy.dropHead)

    source ~> merge ~> fetch           ~> bcast ~> data
              merge <~ buffer <~ kids  <~ bcast

    FlowShape(source.in, data.out)
  }

val flow = Flow.fromGraph(loop)


Await.result(
  Source.single(0).via(flow).runWith(Sink.foreach(println)),
  Duration.Inf
)

system.terminate()
Run Code Online (Sandbox Code Playgroud)

但是,由于我正在使用带缓冲区的流,因此Stream永远不会完成.

上游完成并且缓冲元素已耗尽时完成

Flow.buffer

我多次阅读图形周期,活跃度和死锁部分,我仍然在努力寻找答案.

这将创建一个实时锁定:

import java.util.concurrent.atomic.AtomicInteger

def unfold[S, E](seed: S, flow: Flow[S, E, NotUsed])(loop: E => List[S]): Source[E, NotUsed] = {
  // keep track of how many element flows, 
  val remaining = new AtomicInteger(1) // 1 = seed

  // should be > max loop(x)
  val bufferSize = 10000

  val (ref, publisher) =
    Source.actorRef[S](bufferSize, OverflowStrategy.fail)
      .toMat(Sink.asPublisher(true))(Keep.both)
      .run()

  ref ! seed

  Source.fromPublisher(publisher)
    .via(flow)
    .map{x =>
      loop(x).foreach{ c =>
        remaining.incrementAndGet()
        ref ! c
      }
      x
    }
    .takeWhile(_ => remaining.decrementAndGet > 0)
}
Run Code Online (Sandbox Code Playgroud)

编辑:我添加了一个git仓库来测试你的解决方案https://github.com/MasseGuillaume/source-unfold

Ram*_*gil 5

未完成的原因

我不认为流的原因永远不会完成是由于"使用带缓冲区的流".与此问题类似,实际原因是与默认参数的合并在它(合并)完成之前eagerClose=False等待sourcebuffer完成.但缓冲区正在等待合并完成.因此,merge正在等待缓冲区,缓冲区正在等待合并.

eagerClose合并

您可以eagerClose=True在创建合并时进行设置.但是,使用急切关闭可能会导致一些孩子ItemId永远不会被质疑.

间接解决方案

如果为树的每个级别实现新流,则可以将递归提取到流的外部.

您可以使用以下命令构建查询函数itemFlow:

val itemQuery : Iterable[ItemId] => Future[Seq[Data]] = 
  (itemIds) => Source.apply(itemIds)
                     .via(itemFlow)
                     .runWith(Sink.seq[Data])
Run Code Online (Sandbox Code Playgroud)

此查询函数现在可以包含在递归帮助函数中:

val recQuery : (Iterable[ItemId], Iterable[Data]) => Future[Seq[Data]] = 
  (itemIds, currentData) => itemQuery(itemIds) flatMap { allNewData =>
      val allNewKids = allNewData.flatMap(_.kids).toSet

      if(allNewKids.isEmpty)
        Future.successful(currentData ++ allNewData)
      else
        recQuery(allNewKids, currentData ++ data)
  }
Run Code Online (Sandbox Code Playgroud)

创建的流数量将等于树的最大深度.

不幸的是,由于涉及Futures,这个递归函数不是尾递归的,如果树太深,可能会导致"堆栈溢出".


Gui*_*ssé 3

我通过编写自己的 GraphStage 解决了这个问题。

import akka.NotUsed
import akka.stream._
import akka.stream.scaladsl._
import akka.stream.stage.{GraphStage, GraphStageLogic, OutHandler}

import scala.concurrent.ExecutionContext

import scala.collection.mutable
import scala.util.{Success, Failure, Try}

import scala.collection.mutable

def unfoldTree[S, E](seeds: List[S], 
                     flow: Flow[S, E, NotUsed],
                     loop: E => List[S],
                     bufferSize: Int)(implicit ec: ExecutionContext): Source[E, NotUsed] = {
  Source.fromGraph(new UnfoldSource(seeds, flow, loop, bufferSize))
}

object UnfoldSource {
  implicit class MutableQueueExtensions[A](private val self: mutable.Queue[A]) extends AnyVal {
    def dequeueN(n: Int): List[A] = {
      val b = List.newBuilder[A]
      var i = 0
      while (i < n) {
        val e = self.dequeue
        b += e
        i += 1
      }
      b.result()
    }
  }
}

class UnfoldSource[S, E](seeds: List[S],
                         flow: Flow[S, E, NotUsed],
                         loop: E => List[S],
                         bufferSize: Int)(implicit ec: ExecutionContext) extends GraphStage[SourceShape[E]] {

  val out: Outlet[E] = Outlet("UnfoldSource.out")
  override val shape: SourceShape[E] = SourceShape(out)

  override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with OutHandler {  
    // Nodes to expand
    val frontier = mutable.Queue[S]()
    frontier ++= seeds

    // Nodes expanded
    val buffer = mutable.Queue[E]()

    // Using the flow to fetch more data
    var inFlight = false

    // Sink pulled but the buffer was empty
    var downstreamWaiting = false

    def isBufferFull() = buffer.size >= bufferSize

    def fillBuffer(): Unit = {
      val batchSize = Math.min(bufferSize - buffer.size, frontier.size)
      val batch = frontier.dequeueN(batchSize)
      inFlight = true

      val toProcess =
        Source(batch)
          .via(flow)
          .runWith(Sink.seq)(materializer)

      val callback = getAsyncCallback[Try[Seq[E]]]{
        case Failure(ex) => {
          fail(out, ex)
        }
        case Success(es) => {
          val got = es.size
          inFlight = false
          es.foreach{ e =>
            buffer += e
            frontier ++= loop(e)
          }
          if (downstreamWaiting && buffer.nonEmpty) {
            val e = buffer.dequeue
            downstreamWaiting = false
            sendOne(e)
          } else {
            checkCompletion()
          }
          ()
        }
      }

      toProcess.onComplete(callback.invoke)
    }
    override def preStart(): Unit = {
      checkCompletion()
    }

    def checkCompletion(): Unit = {
      if (!inFlight && buffer.isEmpty && frontier.isEmpty) {
        completeStage()
      }
    } 

    def sendOne(e: E): Unit = {
      push(out, e)
      checkCompletion()
    }

    def onPull(): Unit = {
      if (buffer.nonEmpty) {
        sendOne(buffer.dequeue)
      } else {
        downstreamWaiting = true
      }

      if (!isBufferFull && frontier.nonEmpty) {
        fillBuffer()
      }
    }

    setHandler(out, this)
  }
}
Run Code Online (Sandbox Code Playgroud)