小编Shy*_*yam的帖子

IllegalArgumentException:读取“delta”文件时未知消息类型:9

我在项目中使用 <spark.version>3.1.2</spark.version> 和“delta”湖 io.delta:delta-core_2.12:1.0.0 。

在阅读“delta”文件时,我遇到以下错误:IllegalArgumentException: Unknown message type: 9 error

java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 4 ($anonfun$apply$2 at DatabricksLogging.scala:77) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: java.lang.IllegalArgumentException: Unknown message type: 9  at org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Decoder.fromByteBuffer(BlockTransferMessage.java:71)  at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:80)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)     at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)     at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    ... 1 more 
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
    at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
    at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
    at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:464)
    at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:401)
    at org.apache.spark.sql.delta.catalog.DeltaTableV2.deltaLog$lzycompute(DeltaTableV2.scala:73)
    at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:177)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:305) …
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql databricks delta-lake

2
推荐指数
1
解决办法
2519
查看次数

如何在 spark-java 项目中的信息/调试级别记录 spark 数据集 printSchema

试图将我的 spark scala 项目转换为 spark-java 项目。我有一个登录 Scala 如下

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

    class ClassName{
      val logger  = LoggerFactory.getLogger("ClassName")
      ...
      val dataframe1 = ....///read dataframe from text file.
      ...

      logger.debug("dataframe1.printSchema : \n " + dataframe1.printSchema; //this is working fine.
    }
Run Code Online (Sandbox Code Playgroud)

现在我正在尝试用 java 1.8 编写它,如下所示

public class ClassName{

    public static final Logger logger  = oggerFactory.getLogger("ClassName"); 
      ...
     Dataset<Row> dataframe1 = ....///read dataframe from text file.
     ...

     logger.debug("dataframe1.printSchema : \n " + dataframe1.printSchema()); //this is not working 

}
Run Code Online (Sandbox Code Playgroud)

我尝试了几种方法,但没有任何方法可以在调试/信息模式下记录 printSchema。

dataframe1.printSchema() // 这实际上返回 void …

java sql scala apache-spark apache-spark-sql

1
推荐指数
1
解决办法
3666
查看次数