我试图使用随机森林模型来预测一组示例,但似乎我不能使用该模型对示例进行分类.这是pyspark中使用的代码:
sc = SparkContext(appName="App")
model = RandomForest.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={}, impurity='gini', numTrees=150)
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream(hostname, int(port))
parsedLines = lines.map(parse)
parsedLines.pprint()
predictions = parsedLines.map(lambda event: model.predict(event.features))
Run Code Online (Sandbox Code Playgroud)
和在集群中编译时返回的错误:
Error : "It appears that you are attempting to reference SparkContext from a broadcast "
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, …Run Code Online (Sandbox Code Playgroud) python apache-spark spark-streaming pyspark apache-spark-mllib
我一直在做一些几何数据分析(GDA),例如主成分分析(PCA)。我想绘制一个相关圆......这些看起来有点像这样:
![1]](https://i.stack.imgur.com/2pjd8.png)
基本上,它允许测量变量的特征值/特征向量与数据集的主成分(维度)相关的扩展。
任何人都知道是否有绘制此类数据可视化的python包?