相关疑难解决方法(0)

Spark java.lang.StackOverflowError

我正在使用spark来计算用户评论的页面,但是java.lang.StackOverflowError当我在大数据集上运行我的代码时,我会不断获得Spark (40k条目).当在少量条目上运行代码时,它工作正常.

输入示例:

product/productId: B00004CK40   review/userId: A39IIHQF18YGZA   review/profileName: C. A. M. Salas  review/helpfulness: 0/0 review/score: 4.0   review/time: 1175817600 review/summary: Reliable comedy review/text: Nice script, well acted comedy, and a young Nicolette Sheridan. Cusak is in top form.
Run Code Online (Sandbox Code Playgroud)

代码:

public void calculatePageRank() {
    sc.clearCallSite();
    sc.clearJobGroup();

    JavaRDD < String > rddFileData = sc.textFile(inputFileName).cache();
    sc.setCheckpointDir("pagerankCheckpoint/");

    JavaRDD < String > rddMovieData = rddFileData.map(new Function < String, String > () {

        @Override
        public String call(String arg0) throws Exception {
            String[] data = arg0.split("\t"); …
Run Code Online (Sandbox Code Playgroud)

java mapreduce apache-spark

8
推荐指数
3
解决办法
1万
查看次数

在Spark中处理大量列时出现StackOverflowError

我有一个宽数据框(130000行x 8700列),当我尝试对所有列求和时,出现以下错误:

scala.collection.generic.Growable $$ anonfun $$ plus $ plus $ eq $ 1.apply(Growable.scala:59)处scala.collection.generic.Growable $$ anonfun的线程“ main”中的java.lang.StackOverflowError异常在scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)的$$ plus $ plus $ eq $ 1.apply(在scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) )在scala.collection.mutable.ListBuffer。$ plus $ plus $ eq(ListBuffer.scala:183)在scala.collection.generic.Growable $ class。$ plus $ plus $ eq(Growable.scala:59)在scala。在scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:49)处的collection.mutable.ListBuffer。$ plus $ plus $ eq(ListBuffer.scala:45)在org.apache.spark.sql.catalyst.expressions.BinaryExpression .children(Expression.scala:400)在组织。apache.spark.sql.catalyst.trees.TreeNode.containsChild $ lzycompute(TreeNode.scala:88)...

这是我的Scala代码:

  val df = spark.read
    .option("header", "false")
    .option("delimiter", "\t")
    .option("inferSchema", "true")
    .csv("D:\\Documents\\Trabajo\\Fábregas\\matrizLuna\\matrizRelativa")


  val arrayList = df.drop("cups").columns
  var colsList = List[Column]()
  arrayList.foreach { c => colsList :+= col(c) }

  val df_suma = df.withColumn("consumo_total", colsList.reduce(_ + …
Run Code Online (Sandbox Code Playgroud)

stack-overflow scala mapreduce apache-spark spark-dataframe

5
推荐指数
1
解决办法
1404
查看次数