小编kri*_*hna的帖子

增加了两个RDD [mllib.linalg.Vector]

我需要添加两个存储在两个文件中的矩阵.

内容latest1.txtlatest2.txt下一个str:

1 2 3
4 5 6
7 8 9

我正在阅读这些文件如下:

scala> val rows = sc.textFile(“latest1.txt”).map { line => val values = line.split(‘ ‘).map(_.toDouble)
    Vectors.sparse(values.length,values.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
}

scala> val r1 = rows
r1: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MappedRDD[2] at map at :14

scala> val rows = sc.textFile(“latest2.txt”).map { line => val values = line.split(‘ ‘).map(_.toDouble)
    Vectors.sparse(values.length,values.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
}

scala> val r2 = rows
r2: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MappedRDD[2] at map …
Run Code Online (Sandbox Code Playgroud)

scala apache-spark apache-spark-mllib

12
推荐指数
1
解决办法
1万
查看次数

标签 统计

apache-spark ×1

apache-spark-mllib ×1

scala ×1