vae*_*r-k 7 hadoop-yarn apache-spark pyspark apache-spark-ml
我有一个由 2,818,615 行 388 长度pyspark.ml.linalg.SparseVector和一个类标签组成的数据框。我想使用 pyspark mlRandomForestClassifier使用此数据集。每次我尝试训练模型时,spark 都会运行大约 30 分钟,然后会因为sparkContext关闭而失败。如果我将数据集的大小限制为仅 25K 行,则模型可以成功训练,但我需要使用更大的数据集。
这里可能有哪些故障排除步骤?
print(df.rdd.getNumPartitions())   
8
df.show()
+--------------------+-----+
|            features|label|
+--------------------+-----+
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    0|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    2|
|(388,[1,355,361,3...|    2|
|(388,[1,355,361,3...|    1|
|(388,[1,355,361,3...|    0|
+--------------------+-----+
only showing top 20 rows
我的硬件:
以下是我(尝试)训练模型的方法:
rf = RandomForestClassifier(featuresCol='features', labelCol='label')
grid = ParamGridBuilder().addGrid(rf.numTrees, [30, 50, 75]).addGrid(rf.maxDepth, [10, 20]).build()
evaluator = MulticlassClassificationEvaluator(metricName="f1")
cv = SparkCV(estimator=rf, estimatorParamMaps=grid, evaluator=evaluator, numFolds=3)
cvModel = cv.fit(df)
回溯声称作业失败,因为:
py4j.protocol.Py4JJavaError: An error occurred while calling o417.fit.
: org.apache.spark.SparkException: Job 76 cancelled because SparkContext was shut down
以下是 Spark 日志的最后几行:
17/11/07 23:15:04 INFO ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 31.
17/11/07 23:15:04 INFO YarnAllocator: Driver requested a total number of 13 executor(s).
17/11/07 23:15:04 INFO ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 14.
17/11/07 23:15:04 INFO YarnAllocator: Driver requested a total number of 12 executor(s).
17/11/07 23:15:04 INFO ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 12.
17/11/07 23:16:21 INFO YarnAllocator: Driver requested a total number of 9 executor(s).
17/11/07 23:16:21 INFO ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 30, 18, 19.
17/11/07 23:20:07 ERROR ApplicationMaster: RECEIVED SIGNAL TERM
17/11/07 23:20:07 INFO ApplicationMaster: Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/11/07 23:20:07 INFO ShutdownHookManager: Shutdown hook called
| 归档时间: | 
 | 
| 查看次数: | 4613 次 | 
| 最近记录: |