小编woo*_*r13的帖子

要求失败:此摘要器中尚未添加任何内容

我正在尝试测试 pyspark 是否在我的系统上正常运行,但是当我尝试对我的数据调用 fit 时,我收到错误“要求失败:没有向此摘要器添加任何内容”

import findspark
import os
spark_location='/usr/local/spark/'
java8_location= '/usr/lib/jvm/java-8-openjdk-amd64'
os.environ['JAVA_HOME'] = java8_location
findspark.init(spark_home=spark_location)
import pyspark, itertools, string, datetime, math
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.sql import SparkSession
from pyspark.mllib.evaluation import RegressionMetrics
from pyspark.sql.functions import isnan, isnull, when, count, col

def main():
    spark = pyspark.sql.SparkSession.builder.appName("test").getOrCreate()
    sc = spark.sparkContext
    #data = spark.read.option("inferSchema", True).option("header", True).csv("ml-20m/ratings.csv").drop("timestamp")
    data = spark.read.option("inferSchema", True).option("header", True).csv("ml-20m/ratings_test.csv").drop("timestamp")
    train,test= data.randomSplit([0.8, 0.2])
    print("before als")
    als = ALS(userCol="userId", itemCol="movieId", ratingCol="rating", coldStartStrategy="drop", …
Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark

4
推荐指数
1
解决办法
4500
查看次数

标签 统计

apache-spark ×1

pyspark ×1

python ×1