我正在使用spark来计算用户评论的页面,但是java.lang.StackOverflowError当我在大数据集上运行我的代码时,我会不断获得Spark (40k条目).当在少量条目上运行代码时,它工作正常.
输入示例:
product/productId: B00004CK40 review/userId: A39IIHQF18YGZA review/profileName: C. A. M. Salas review/helpfulness: 0/0 review/score: 4.0 review/time: 1175817600 review/summary: Reliable comedy review/text: Nice script, well acted comedy, and a young Nicolette Sheridan. Cusak is in top form.
Run Code Online (Sandbox Code Playgroud)
代码:
public void calculatePageRank() {
sc.clearCallSite();
sc.clearJobGroup();
JavaRDD < String > rddFileData = sc.textFile(inputFileName).cache();
sc.setCheckpointDir("pagerankCheckpoint/");
JavaRDD < String > rddMovieData = rddFileData.map(new Function < String, String > () {
@Override
public String call(String arg0) throws Exception {
String[] data = arg0.split("\t"); …Run Code Online (Sandbox Code Playgroud)