小编ALu*_*unz的帖子

Hive不会读取Spark生成的分区镶木地板文件

我在浏览Hive中的Spark生成的分区镶木地板文件时遇到问题.我可以在配置单元中创建外部表但是当我尝试选择几行时,配置单元只返回没有行的"OK"消息.

我能够在Spark中正确读取分区的镶木地板文件,所以我假设它们是正确生成的.当我在没有分区的情况下在hive中创建外部表时,我也能够读取这些文件.

有没有人有建议?

我的环境是:

  • 群集EMR 4.1.0
  • Hive 1.0.0
  • Spark 1.5.0
  • 色调3.7.1
  • Parquet文件存储在S3存储桶中(s3:// staging-dev/test/ttfourfieldspart2/year = 2013/month = 11)

我的Spark配置文件具有以下参数(/etc/spark/conf.dist/spark-defaults.conf):

spark.master yarn
spark.driver.extraClassPath /etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
spark.driver.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.executor.extraClassPath /etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
spark.executor.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.eventLog.enabled true
spark.eventLog.dir hdfs:///var/log/spark/apps
spark.history.fs.logDirectory hdfs:///var/log/spark/apps
spark.yarn.historyServer.address ip-10-37-161-246.ec2.internal:18080
spark.history.ui.port 18080
spark.shuffle.service.enabled true
spark.driver.extraJavaOptions    -Dlog4j.configuration=file:///etc/spark/conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=512M -XX:OnOutOfMemoryError='kill -9 %p'
spark.executor.extraJavaOptions  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
spark.executor.memory 4G
spark.driver.memory 4G
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.maxExecutors 100
spark.dynamicAllocation.minExecutors 1
Run Code Online (Sandbox Code Playgroud)

Hive配置文件具有以下参数(/etc/hive/conf/hive-site.xml):

<configuration>

<!-- Hive Configuration can either be stored …
Run Code Online (Sandbox Code Playgroud)

hive partitioning partition apache-spark parquet

10
推荐指数
1
解决办法
5675
查看次数

标签 统计

apache-spark ×1

hive ×1

parquet ×1

partition ×1

partitioning ×1