kgn*_*ete 8 machine-learning pyspark
使用版本2.0.0中的pySpark ML API进行线性回归简单示例,我得到了一个新ML库的错误.
代码是:
from pyspark.sql import SQLContext
sqlContext =SQLContext(sc)
from pyspark.mllib.linalg import Vectors
data=sc.parallelize(([1,2],[2,4],[3,6],[4,8]))
def f2Lp(inStr):
return (float(inStr[0]), Vectors.dense(inStr[1]))
Lp = data.map(f2Lp)
testDF=sqlContext.createDataFrame(Lp,["label","features"])
(trainingData, testData) = testDF.randomSplit([0.8,0.2])
from pyspark.ml.regression import LinearRegression
lr=LinearRegression()
model=lr.fit(trainingData)
Run Code Online (Sandbox Code Playgroud)
和错误:
IllegalArgumentException: u'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.'
Run Code Online (Sandbox Code Playgroud)
我应该如何将矢量要素从.mllib转换为.ml类型?
从Spark2.0开始使用
from pyspark.ml.linalg import Vectors, VectorUDT
Run Code Online (Sandbox Code Playgroud)
代替
from pyspark.mllib.linalg import Vectors, VectorUDT
Run Code Online (Sandbox Code Playgroud)