我有一个RDD,我试图序列化,然后通过反序列化重建.我试图看看Apache Spark中是否可行.
static JavaSparkContext sc = new JavaSparkContext(conf);
static SerializerInstance si = SparkEnv.get().closureSerializer().newInstance();
static ClassTag<JavaRDD<String>> tag = scala.reflect.ClassTag$.MODULE$.apply(JavaRDD.class);
..
..
JavaRDD<String> rdd = sc.textFile(logFile, 4);
System.out.println("Element 1 " + rdd.first());
ByteBuffer bb= si.serialize(rdd, tag);
JavaRDD<String> rdd2 = si.deserialize(bb, Thread.currentThread().getContextClassLoader(),tag);
System.out.println(rdd2.partitions().size());
System.out.println("Element 0 " + rdd2.first());
Run Code Online (Sandbox Code Playgroud)
当我对新创建的RDD执行操作时,我在最后一行得到一个例外.我序列化的方式类似于Spark内部的方式.
Exception in thread "main" org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation …Run Code Online (Sandbox Code Playgroud)