Apache Spark GraphX连接组件

Ole*_*kov 7 apache-spark spark-graphx

如何使用子图函数来获取仅包含特定连接组件的顶点和边的图形?假设我知道连接组件id,最终目标是基于连接组件创建新图.我想保留原始图形中的顶点属性.

Dan*_*bos 6

您必须将图形与组件ID连接到原始图形,按组件ID过滤(获取子图),然后丢弃组件ID.

import scala.reflect._
import org.apache.spark.graphx._
import org.apache.spark.graphx.lib.ConnectedComponents

def getComponent[VD: ClassTag, ED: ClassTag](
    g: Graph[VD, ED], component: VertexId): Graph[VD, ED] = {
  val cc: Graph[VertexId, ED] = ConnectedComponents.run(g)
  // Join component ID to the original graph.
  val joined = g.outerJoinVertices(cc.vertices) {
    (vid, vd, cc) => (vd, cc)
  }
  // Filter by component ID.
  val filtered = joined.subgraph(vpred = {
    (vid, vdcc) => vdcc._2 == Some(component)
  })
  // Discard component IDs.
  filtered.mapVertices {
    (vid, vdcc) => vdcc._1
  }
}
Run Code Online (Sandbox Code Playgroud)