我正在尝试从 Azure Data Lake Gen1 读取 avro 数据,这些数据是从 Azure EventHubs 生成的,在 Azure Databricks 中使用 pyspark 启用了 Azure Event Hubs Capture:
inputdata = "evenhubscapturepath/*/*"
rawData = spark.read.format("avro").load(inputdata)
Run Code Online (Sandbox Code Playgroud)
以下语句失败
rawData.count()
Run Code Online (Sandbox Code Playgroud)
和
org.apache.spark.SparkException: Job aborted due to stage failure: Task 162 in stage 48.0 failed 4 times, most recent failure: Lost task 162.3 in stage 48.0 (TID 2807, 10.3.2.4, executor 1): java.io.IOException: Not an Avro data file
Run Code Online (Sandbox Code Playgroud)
EventHub-Capture 是否正在写入非 Avro 数据?是否有使用 Spark 读取 EventHub 捕获数据的最佳实践?
azure azure-eventhub pyspark azure-eventhub-capture azure-databricks