小编ros*_*han的帖子

如何使用镶木地板读取和写入火花中的同一个文件?

我试图从 spark 中的镶木地板文件中读取,与另一个 rdd 进行联合,然后将结果写入我读取的同一个文件中(基本上是覆盖),这会引发以下错误:

 couldnt write parquet to file: An error occurred while calling o102.parquet.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenExchange hashpartitioning(billID#42,200), None
+- Union
   :- Scan ParquetRelation[units#35,price#36,priceSold#37,orderingTime#38,itemID#39,storeID#40,customerID#41,billID#42,sourceRef#43] InputPaths: hdfs://master-wat:8020/user/root/dataFile/parquet/general/NPM61LKK1C/Billbody
   +- Project [units#22,price#23,priceSold#24,orderingTime#25,itemID#26,storeID#27,customerID#28,billID#29,2 AS sourceRef#30]
      +- Scan ExistingRDD[units#22,price#23,priceSold#24,orderingTime#25,itemID#26,storeID#27,customerID#28,billID#29] 

    at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
    at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.Window.doExecute(Window.scala:245)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.Filter.doExecute(basicOperators.scala:70)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) …
Run Code Online (Sandbox Code Playgroud)

overwrite apache-spark parquet

6
推荐指数
3
解决办法
7337
查看次数

标签 统计

apache-spark ×1

overwrite ×1

parquet ×1