我目前正在使用Cloudera 5.6尝试基于另一个表在hive表中创建一个镶木地板格式表,但我遇到了一个错误.
create table sfdc_opportunities_sandbox_parquet like
sfdc_opportunities_sandbox STORED AS PARQUET
Run Code Online (Sandbox Code Playgroud)
错误信息
Parquet does not support date. See HIVE-6384
Run Code Online (Sandbox Code Playgroud)
我读到hive 1.2有一个解决这个问题的方法,但是Cloudera 5.6和5.7并没有配备hive 1.2.有没有人找到解决这个问题的方法?
我遇到了一个奇怪的问题.当我在大型数据集(> 1TB压缩文本文件)上运行Hadoop作业时,一些reduce任务失败,堆栈跟踪如下:
java.io.IOException: Task: attempt_201104061411_0002_r_000044_0 - The reduce copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:385)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:234)
Caused by: java.io.IOException: Intermediate merge failed
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2714)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2639)
Caused by: java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2698)
... 1 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at com.__.hadoop.pixel.segments.IpCookieCountFilter$IpAndIpCookieCount.readFields(IpCookieCountFilter.java:241)
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:125)
... 8 more
Run Code Online (Sandbox Code Playgroud)
java.io.IOException: Task: attempt_201104061411_0002_r_000056_0 - The reduce copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:385)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240) …Run Code Online (Sandbox Code Playgroud) 我试图找出Map任务的输出在Reduce任务可以使用之前保存到磁盘的位置.
注意: - 使用的版本是带有新API的Hadoop 0.20.204
例如,在Map类中覆盖map方法时:
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
// code that starts a new Job.
}
Run Code Online (Sandbox Code Playgroud)
我有兴趣找出context.write()最终写入数据的位置.到目前为止我遇到了:
FileOutputFormat.getWorkOutputPath(context);
Run Code Online (Sandbox Code Playgroud)
这给了我在hdfs上的以下位置:
hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0
Run Code Online (Sandbox Code Playgroud)
当我尝试将它用作另一个作业的输入时,它会给我以下错误:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0
Run Code Online (Sandbox Code Playgroud)
注意:作业是在Mapper中启动的,因此从技术上讲,Mapper任务写入的临时文件夹的输出在新作业开始时存在.然后,它仍然说输入路径不存在.
有关临时输出写入的想法吗?或者也许在同时具有Map和Reduce阶段的作业中我可以找到Map任务输出的位置是什么?
当我想加载rJava时出错.JDK已安装.(我在CentOS VM上运行R(cloudera demo vm cdh3u4))
> library(rJava)
Error : .onLoad failed in loadNamespace() for 'rJava', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15/rJava/libs/rJava.so':
libjvm.so: cannot open shared object file: No such file or directory
Error: package/namespace load failed for ‘rJava’
Run Code Online (Sandbox Code Playgroud)
LD_LIBRARY_PATH设置有问题吗?如果是,我该如何解决?我需要运行rJava来安装rhdfs.
更多信息(如果需要):
[cloudera@localhost ~]$ java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Run Code Online (Sandbox Code Playgroud) 我正在寻找能够运行与示例和测试jar相关联的hadoop作业的jar文件.在过去,他们在/ usr/lib/hadoop下,但显然不再.指针赞赏.
注意:这个问题最初是针对CDH4.2的.但是一些答案包括更高版本的信息
我目前正在尝试使用Cloudera Hadoop环境中的Encrypted Shuffle测试已实现的更改以实现安全性.
我已经创建了证书和密钥库,并将它们保存在适当的位置.
我正在测试TaskTracker的HTTPS端口50060.
当我对该端口进行卷曲时,我得到以下错误响应.
ubuntu@node2:~$ curl -v -k "https://10.0.10.90:50060"
* About to connect() to 10.0.10.90 port 50060 (#0)
* Trying 10.0.10.90... connected
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
* Closing connection #0
curl: (35) error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
Run Code Online (Sandbox Code Playgroud)
当我检查开放的ssl客户端时,我得到了以下响应
ubuntu@node2:~$ openssl s_client -connect 10.0.10.90:50060
CONNECTED(00000003)
139749924464288:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:749:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake …Run Code Online (Sandbox Code Playgroud) 我正在尝试与Hive建立JDBC连接,以便我可以查看和创建表并从Eclipse查询Hive表.我使用了HiveClient示例代码:https://cwiki.apache.org/confluence/display/Hive/HiveClient 然后我将所有必需的jar添加到eclipse中的java构建路径并启动了Hive Thrift Server.端口10000正在侦听.我正在使用Cloudera QuickstartVM 4.6.1及其附带的eclipse.这是我尝试运行代码时在IDE中出现的错误.
Exception in thread "main" java.sql.SQLException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:191)
at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:127)
at org.apache.hadoop.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:108)
at org.apache.hadoop.hive.jdbc.HiveConnection.<init>(HiveConnection.java:103)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at jdbc.Hive.main(Hive.java:24)
Run Code Online (Sandbox Code Playgroud)
当我尝试使用beeline连接到Hive时,我得到了同样的错误.但是,当我从!connect命令中删除主机名和端口时,它会处理以下错误:
beeline> !connect jdbc:hive:// "" ""
scan complete in 4ms
Connecting to jdbc:hive://
14/03/21 18:42:03 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
14/03/21 18:42:03 INFO metastore.HiveMetaStore: …Run Code Online (Sandbox Code Playgroud) 我正在尝试将Twitter提要流式传输到hdfs,然后使用配置单元.但第一部分,流数据和加载到hdfs不起作用,并给出Null指针异常.
这就是我尝试过的.
1.下载了apache-flume-1.4.0-bin.tar.提取它.将所有内容复制到/ usr/lib/flume /.在/ usr/lib/i中将所有者更改为flume目录的用户.当我做LS命令在/ usr/lib中/水槽/,它表明
bin CHANGELOG conf DEVNOTES docs lib LICENSE logs NOTICE README RELEASE-NOTES tools
Run Code Online (Sandbox Code Playgroud)
2.移至conf /目录.我将文件复制flume-env.sh.template为flume-env.sh并将JAVA_HOME编辑为我的java路径/usr/lib/jvm/java-7-oracle.
3.接下来,我在同一目录中创建了一个名为flume.conf的文件,conf并添加了以下内容
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <Twitter Application API key>
TwitterAgent.sources.Twitter.consumerSecret = <Twitter Application API secret>
TwitterAgent.sources.Twitter.accessToken = <Twitter Application Access token>
TwitterAgent.sources.Twitter.accessTokenSecret = …Run Code Online (Sandbox Code Playgroud) 我一直在寻找使用Hortonworks 2.1安装的Storm,但为了避免在Cloudera安装(其中包含Spark)之外安装Hortonworks,我试图找到一种在Cloudera中使用Storm的方法.
如果可以在单个平台上同时使用Storm和Spark,那么它将节省在计算机上安装Cloudera和Hortonworks所需的额外资源.
我们使用Cloudera Manager启用Namenode High Availability
Cloudera Manager >> HDFS >> Action> Enable High Availability >> Selected Stand By Namenode&Journal Nodes然后nameservice1
完成整个过程后,部署客户端配置.
通过列出HDFS目录(hadoop fs -ls /)从Client Machine进行测试,然后手动故障转移到备用namenode并再次列出HDFS目录(hadoop fs -ls /).这项测试完美无缺.
但是当我使用以下命令运行hadoop睡眠作业时它失败了
$ hadoop jar /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop-0.20-mapreduce/hadoop-examples.jar sleep -m 1 -r 0
java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:980)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:974)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:974) …Run Code Online (Sandbox Code Playgroud)