我尝试在我的 win10 笔记本电脑上运行 PySpark 脚本,该脚本正在使用 PySpark 和 Spark MLlib 构建线性回归模型,
我的代码如下:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
import pandas as pd
sc = SparkContext()
sqlContext = SQLContext(sc)
house_df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load(
'data/boston.csv')
house_df1 = house_df.drop('ID')
import six
for i in house_df1.columns:
if not (isinstance(house_df1.select(i).take(1)[0][0], six.string_types)):
print("Correlation to MEDV for ", i, house_df1.stat.corr('medv', i))
vectorAssembler = VectorAssembler(inputCols=['crim', 'zn', 'indus',
'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
'ptratio', 'black', 'lstat'], outputCol='features') …Run Code Online (Sandbox Code Playgroud)