我正在使用spark来计算用户评论的页面,但是java.lang.StackOverflowError当我在大数据集上运行我的代码时,我会不断获得Spark (40k条目).当在少量条目上运行代码时,它工作正常.
输入示例:
product/productId: B00004CK40 review/userId: A39IIHQF18YGZA review/profileName: C. A. M. Salas review/helpfulness: 0/0 review/score: 4.0 review/time: 1175817600 review/summary: Reliable comedy review/text: Nice script, well acted comedy, and a young Nicolette Sheridan. Cusak is in top form.
Run Code Online (Sandbox Code Playgroud)
代码:
public void calculatePageRank() {
sc.clearCallSite();
sc.clearJobGroup();
JavaRDD < String > rddFileData = sc.textFile(inputFileName).cache();
sc.setCheckpointDir("pagerankCheckpoint/");
JavaRDD < String > rddMovieData = rddFileData.map(new Function < String, String > () {
@Override
public String call(String arg0) throws Exception {
String[] data = arg0.split("\t"); …Run Code Online (Sandbox Code Playgroud) 我有一个宽数据框(130000行x 8700列),当我尝试对所有列求和时,出现以下错误:
scala.collection.generic.Growable $$ anonfun $$ plus $ plus $ eq $ 1.apply(Growable.scala:59)处scala.collection.generic.Growable $$ anonfun的线程“ main”中的java.lang.StackOverflowError异常在scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)的$$ plus $ plus $ eq $ 1.apply(在scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) )在scala.collection.mutable.ListBuffer。$ plus $ plus $ eq(ListBuffer.scala:183)在scala.collection.generic.Growable $ class。$ plus $ plus $ eq(Growable.scala:59)在scala。在scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:49)处的collection.mutable.ListBuffer。$ plus $ plus $ eq(ListBuffer.scala:45)在org.apache.spark.sql.catalyst.expressions.BinaryExpression .children(Expression.scala:400)在组织。apache.spark.sql.catalyst.trees.TreeNode.containsChild $ lzycompute(TreeNode.scala:88)...
这是我的Scala代码:
val df = spark.read
.option("header", "false")
.option("delimiter", "\t")
.option("inferSchema", "true")
.csv("D:\\Documents\\Trabajo\\Fábregas\\matrizLuna\\matrizRelativa")
val arrayList = df.drop("cups").columns
var colsList = List[Column]()
arrayList.foreach { c => colsList :+= col(c) }
val df_suma = df.withColumn("consumo_total", colsList.reduce(_ + …Run Code Online (Sandbox Code Playgroud)