fan*_*ndi 6 java scala dataframe apache-spark rdd
我是Apache Spark的新手,我创建了几个RDD和DataFrames,缓存它们,现在我想通过使用下面的命令来解决它们中的一些问题
rddName.unpersist()
Run Code Online (Sandbox Code Playgroud)
但我不记得他们的名字.我使用sc.getPersistentRDDs但输出不包括名称.我还使用浏览器查看缓存的rdds,但同样没有名称信息.我错过了什么吗?
小智 7
PySparkers:getPersistentRDDs 尚未在Python中实现,因此通过浸入Java来解决您的RDD问题:
for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():
rdd.unpersist()
Run Code Online (Sandbox Code Playgroud)
@Dikei的答案实际上是正确的,但我相信您正在寻找的是sc.getPersistentRDDs:
scala> val rdd1 = sc.makeRDD(1 to 100)
# rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:27
scala> val rdd2 = sc.makeRDD(10 to 1000)
# rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at makeRDD at <console>:27
scala> rdd2.cache.setName("rdd_2")
# res0: rdd2.type = rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27
scala> sc.getPersistentRDDs
# res1: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27)
scala> rdd1.cache.setName("foo")
# res2: rdd1.type = foo ParallelCollectionRDD[0] at makeRDD at <console>:27
scala> sc.getPersistentRDDs
# res3: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)
Run Code Online (Sandbox Code Playgroud)
现在让我们添加另一个RDD并命名它:
scala> rdd3.setName("bar")
# res4: rdd3.type = bar ParallelCollectionRDD[2] at makeRDD at <console>:27
scala> sc.getPersistentRDDs
# res5: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27)
Run Code Online (Sandbox Code Playgroud)
我们注意到它实际上并没有持久。
| 归档时间: |
|
| 查看次数: |
4061 次 |
| 最近记录: |