小编Jiv*_*nut的帖子

将GraphFrames ShortestPath Map转换为PySpark中的DataFrame行

我试图找到最有效的方法从GraphFrames函数shortestPaths获取Map输出,并将每个顶点的距离映射平铺为新DataFrame中的各个行.通过将距离列拉入字典然后从那里转换为pandas数据帧然后转换回Spark数据帧,我已经能够非常笨拙地做到这一点,但我知道必须有更好的方法.

from graphframes import *

v = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])

# Create a GraphFrame
g = GraphFrame(v, e)

results = g.shortestPaths(landmarks=["a", "b","c"])
results.select("id","distances").show()

+---+--------------------+
| id|           distances|
+---+--------------------+
|  a|Map(a -> 0, b -> ...|
|  b| Map(b -> 0, c -> 1)| …
Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark spark-dataframe graphframes

4
推荐指数
1
解决办法
917
查看次数