Sud*_*van 5 apache-spark apache-spark-mllib
有人可以帮助我解决以下错误吗?我试图将数据帧转换为rdd,以便它可以用于回归模型构建.
SPARK版本:2.0.0
Error => ClassCastException:org.apache.spark.ml.linalg.DenseVector 无法强制转换为 org.apache.spark.mllib.linalg.向量
代码=>
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.Row
val binarizer2: Binarizer = new Binarizer()
.setInputCol("repay_amt").setOutputCol("label").setThreshold(20.00)
df = binarizer2.transform(df)
val assembler = new VectorAssembler()
.setInputCols(Array("tot_txns", "avg_unpaiddue", "max_unpaiddue", "sale_txn", "max_amt", "tot_sale_amt")).setOutputCol("features")
df = assembler.transform(df)
df.write.mode(SaveMode.Overwrite).parquet("lazpay_final_data.parquet")
val df2 = spark.read.parquet("lazpay_final_data.parquet/")
val df3= df2.rdd.map(r => LabeledPoint(r.getDouble(0),r.getAs("features")))
Run Code Online (Sandbox Code Playgroud)
数据=>
我首先将 ml SparseVector 转换为 Dense Vector,然后再转换为 mllib Vector,从而解决了这个问题。
例如:
val denseVector = r.getAs[org.apache.spark.ml.linalg.SparseVector]("features").toDense
org.apache.spark.mllib.linalg.Vectors.fromML(denseVector)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6031 次 |
| 最近记录: |