小编kau*_*mar的帖子

'KMeansModel' 对象在 apache pyspark 中没有属性 'computeCost'

我正在 pyspark 中试验聚类模型。我试图获得适合不同 K 值的簇的均方成本

def meanScore(k,df):
  inputCol = df.columns[:38]
  assembler = VectorAssembler(inputCols=inputCols,outputCol="features")
  kmeans = KMeans().setK(k)
  pipeModel2 = Pipeline(stages=[assembler,kmeans])
  kmeansModel = pipeModel2.fit(df).stages[-1]
  kmeansModel.computeCost(assembler.transform(df))/data.count()

Run Code Online (Sandbox Code Playgroud)

当我尝试调用此函数来计算数据框中不同 K 值的成本时

for k in range(20,100,20):
  sc = meanScore(k,numericOnly)
  print((k,sc))

Run Code Online (Sandbox Code Playgroud)

我收到属性错误 AttributeError: 'KMeansModel' object has no attribute 'computeCost'

我对 pyspark 相当陌生，刚刚学习，我真诚地感谢对此的任何帮助。谢谢

python cluster-analysis k-means apache-spark pyspark

kau*_*mar

lucky-day

4
推荐指数

1
解决办法

7206
查看次数

标签统计

apache-spark ×1

cluster-analysis ×1

k-means ×1

pyspark ×1

python ×1

'KMeansModel' 对象在 apache pyspark 中没有属性 'computeCost'

标签 统计

小编kau_mar的帖子

标签统计