当在Spark的MLLib中尝试使用ALS训练机器学习模型时,我继续收到StackoverflowError.这是堆栈跟踪的一小部分示例:
Traceback (most recent call last):
File "/Users/user/Spark/imf.py", line 31, in <module>
model = ALS.train(rdd, rank, numIterations)
File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/pyspark/mllib/recommendation.py", line 140, in train
lambda_, blocks, nonnegative, seed)
File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/pyspark/mllib/common.py", line 120, in callMLlibFunc
return callJavaFunc(sc, api, *args)
File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/pyspark/mllib/common.py", line 113, in callJavaFunc
return _java2py(sc, func(*args))
File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.trainALSModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 40.0 failed 1 …Run Code Online (Sandbox Code Playgroud) 我正在使用spark来计算用户评论的页面,但是java.lang.StackOverflowError当我在大数据集上运行我的代码时,我会不断获得Spark (40k条目).当在少量条目上运行代码时,它工作正常.
输入示例:
product/productId: B00004CK40 review/userId: A39IIHQF18YGZA review/profileName: C. A. M. Salas review/helpfulness: 0/0 review/score: 4.0 review/time: 1175817600 review/summary: Reliable comedy review/text: Nice script, well acted comedy, and a young Nicolette Sheridan. Cusak is in top form.
Run Code Online (Sandbox Code Playgroud)
代码:
public void calculatePageRank() {
sc.clearCallSite();
sc.clearJobGroup();
JavaRDD < String > rddFileData = sc.textFile(inputFileName).cache();
sc.setCheckpointDir("pagerankCheckpoint/");
JavaRDD < String > rddMovieData = rddFileData.map(new Function < String, String > () {
@Override
public String call(String arg0) throws Exception {
String[] data = arg0.split("\t"); …Run Code Online (Sandbox Code Playgroud)