如何使用RowMatrix.columnSimilarities的输出

She*_*lly 6 java scala matrix sparse-matrix apache-spark

我需要计算一行的列之间的相似性,并尝试使用columnsimilarities()方法来获得结果.

public static void main(String[] args) {

    SparkConf sparkConf = new SparkConf().setAppName("CollarberativeFilter").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        SparkSession spark = SparkSession.builder().appName("CollarberativeFilter").getOrCreate();
        double[][] array = {{5,0,5}, {0,10,0}, {5,0,5}};
        LinkedList<Vector> rowsList = new LinkedList<Vector>();
        for (int i = 0; i < array.length; i++) {
          Vector currentRow = Vectors.dense(array[i]);
          rowsList.add(currentRow);
        }
        JavaRDD<Vector> rows = sc.parallelize(rowsList);

        // Create a RowMatrix from JavaRDD<Vector>.
        RowMatrix mat = new RowMatrix(rows.rdd());
         CoordinateMatrix simsPerfect = mat.columnSimilarities();
         RowMatrix mat2 = simsPerfect.toRowMatrix();
         List<Vector> vs2 = mat2.rows().toJavaRDD().collect();
         List<Vector> vs = mat.rows().toJavaRDD().collect();
         System.out.println("mat");
         for(Vector v: vs) {
             System.out.println(v);
         }
         System.out.println("mat2");
         for(Vector v: vs2) {
             System.out.println(v);
         }
         JavaRDD<MatrixEntry> entries = simsPerfect.entries().toJavaRDD();
         JavaRDD<String> output = entries.map(new Function<MatrixEntry, String>() {
             public String call(MatrixEntry e) {
                 return String.format("%d,%d,%s", e.i(), e.j(), e.value());
             }
         });
         output.saveAsTextFile("resources123/data.txt");

}
Run Code Online (Sandbox Code Playgroud)

但是

文本文件中的输出为0,2,0.9999999999999998

.

接下来,我尝试了使用double[][] array = {{1,3}, {2,7}}; Then的相同示例

文本文件的输出是0,1,0.9982743731749959

有人可以解释我的答案格式.我不能得到矩阵的每一列对的分数.如3乘3矩阵我需要3个分数1,2列之间的相似性,2,3列,3 ,1列.任何帮助赞赏.