当运行Spark的RandomForest算法时,即使使用相同的种子,我似乎在不同的运行中在树中获得不同的分割.任何人都可以解释我是否做错了(可能),或者实施是错误的(我认为不太可能)?这是我的运行方案:
//read data into rdd
//convert string rdd to LabeledPoint rdd
// train_LP_RDD is RDD of LabeledPoint
// call random forest
val seed = 123417
val numTrees = 10
val numClasses = 2
val categoricalFeaturesInfo: Map[Int, Int] = Map()
val featureSubsetStrategy = "auto"
val impurity = "gini"
val maxDepth = 8
val maxBins = 10
val rfmodel = RandomForest.trainClassifier(train_LP_RDD, numClasses, categoricalFeaturesInfo,
numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins,seed)
println(rfmodel.toDebugString)
Run Code Online (Sandbox Code Playgroud)
在两个不同的运行中,此代码段的输出是不同的.例如,两个结果的差异显示如下:
sdiff -bBWs run1.debug run2.debug
If (feature 2 <= 15.96) | If (feature …Run Code Online (Sandbox Code Playgroud)