我试图使用SCALA中的随机森林分类器模型使用5倍交叉验证来找到准确度.但是我在运行时遇到以下错误:
java.lang.IllegalArgumentException:为RandomForestClassifier提供了带有无效标签列标签的输入,没有指定类的数量.请参见StringIndexer.
在行---> val cvModel = cv.fit(trainingData)获得上述错误
我用于使用随机森林进行数据集交叉验证的代码如下:
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}
import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
val data = sc.textFile("exprogram/dataset.txt")
val parsedData = data.map { line =>
val parts = line.split(',')
LabeledPoint(parts(41).toDouble,
Vectors.dense(parts(0).split(',').map(_.toDouble)))
}
val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0)
val test = splits(1)
val trainingData = training.toDF()
val testData = test.toDF()
val nFolds: Int = 5
val NumTrees: Int = 5
val rf = new
RandomForestClassifier()
.setLabelCol("label")
.setFeaturesCol("features") …Run Code Online (Sandbox Code Playgroud) scala machine-learning random-forest apache-spark apache-spark-mllib