相关疑难解决方法(0)

序列化RDD

我有一个RDD,我试图序列化,然后通过反序列化重建.我试图看看Apache Spark中是否可行.

     static JavaSparkContext sc = new JavaSparkContext(conf);
        static SerializerInstance si = SparkEnv.get().closureSerializer().newInstance();
    static ClassTag<JavaRDD<String>> tag = scala.reflect.ClassTag$.MODULE$.apply(JavaRDD.class);
..
..
            JavaRDD<String> rdd = sc.textFile(logFile, 4);
            System.out.println("Element 1 " + rdd.first());
            ByteBuffer bb= si.serialize(rdd, tag);
            JavaRDD<String> rdd2 = si.deserialize(bb, Thread.currentThread().getContextClassLoader(),tag);
            System.out.println(rdd2.partitions().size());
            System.out.println("Element 0 " + rdd2.first());
Run Code Online (Sandbox Code Playgroud)

当我对新创建的RDD执行操作时,我在最后一行得到一个例外.我序列化的方式类似于Spark内部的方式.

Exception in thread "main" org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation …
Run Code Online (Sandbox Code Playgroud)

java apache-spark rdd

3
推荐指数
1
解决办法
4734
查看次数

标签 统计

apache-spark ×1

java ×1

rdd ×1