我是 pyspark 的新手,正在尝试了解 toDebugstring() 的确切用法。您能从下面的代码片段中解释一下吗?
>>> a = sc.parallelize([1,2,3]).distinct()
>>> print a.toDebugString()
(8) PythonRDD[27] at RDD at PythonRDD.scala:44 [Serialized 1x Replicated]
| MappedRDD[26] at values at NativeMethodAccessorImpl.java:-2 [Serialized 1x Replicated]
| ShuffledRDD[25] at partitionBy at NativeMethodAccessorImpl.java:-2 [Serialized 1x Replicated]
+-(8) PairwiseRDD[24] at distinct at <stdin>:1 [Serialized 1x Replicated]
| PythonRDD[23] at distinct at <stdin>:1 [Serialized 1x Replicated]
| ParallelCollectionRDD[21] at parallelize at PythonRDD.scala:358 [Serialized 1x Replicated]
Run Code Online (Sandbox Code Playgroud)