标签: cloudera

Cloudera 5.6:Parquet不支持日期.见HIVE-6384

我目前正在使用Cloudera 5.6尝试基于另一个表在hive表中创建一个镶木地板格式表,但我遇到了一个错误.

create table sfdc_opportunities_sandbox_parquet like 
sfdc_opportunities_sandbox STORED AS PARQUET
Run Code Online (Sandbox Code Playgroud)

错误信息

Parquet does not support date. See HIVE-6384
Run Code Online (Sandbox Code Playgroud)

我读到hive 1.2有一个解决这个问题的方法,但是Cloudera 5.6和5.7并没有配备hive 1.2.有没有人找到解决这个问题的方法?

hive cloudera parquet

8
推荐指数
1
解决办法
7430
查看次数

Hadoop:中间合并失败

我遇到了一个奇怪的问题.当我在大型数据集(> 1TB压缩文本文件)上运行Hadoop作业时,一些reduce任务失败,堆栈跟踪如下:

java.io.IOException: Task: attempt_201104061411_0002_r_000044_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:385)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at org.apache.hadoop.mapred.Child.main(Child.java:234)
Caused by: java.io.IOException: Intermediate merge failed
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2714)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2639)
Caused by: java.lang.RuntimeException: java.io.EOFException
    at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
    at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2698)
    ... 1 more
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at com.__.hadoop.pixel.segments.IpCookieCountFilter$IpAndIpCookieCount.readFields(IpCookieCountFilter.java:241)
    at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:125)
    ... 8 more
Run Code Online (Sandbox Code Playgroud)
java.io.IOException: Task: attempt_201104061411_0002_r_000056_0 - The reduce copier failed
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:385)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:240) …
Run Code Online (Sandbox Code Playgroud)

hadoop mapreduce cloudera

7
推荐指数
1
解决办法
3805
查看次数

在Hadoop中,框架在正常的Map-Reduce应用程序中保存Map任务的输出?

我试图找出Map任务的输出在Reduce任务可以使用之前保存到磁盘的位置.

注意: - 使用的版本是带有新API的Hadoop 0.20.204

例如,在Map类中覆盖map方法时:

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        context.write(word, one);
    }

    // code that starts a new Job.

}
Run Code Online (Sandbox Code Playgroud)

我有兴趣找出context.write()最终写入数据的位置.到目前为止我遇到了:

FileOutputFormat.getWorkOutputPath(context);
Run Code Online (Sandbox Code Playgroud)

这给了我在hdfs上的以下位置:

hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0
Run Code Online (Sandbox Code Playgroud)

当我尝试将它用作另一个作业的输入时,它会给我以下错误:

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0
Run Code Online (Sandbox Code Playgroud)

注意:作业是在Mapper中启动的,因此从技术上讲,Mapper任务写入的临时文件夹的输出在新作业开始时存在.然后,它仍然说输入路径不存在.

有关临时输出写入的想法吗?或者也许在同时具有Map和Reduce阶段的作业中我可以找到Map任务输出的位置是什么?

java hadoop mapreduce cluster-computing cloudera

7
推荐指数
2
解决办法
2万
查看次数

加载rJava时出错

当我想加载rJava时出错.JDK已安装.(我在CentOS VM上运行R(cloudera demo vm cdh3u4))

> library(rJava)

Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15/rJava/libs/rJava.so':
  libjvm.so: cannot open shared object file: No such file or directory
Error: package/namespace load failed for ‘rJava’
Run Code Online (Sandbox Code Playgroud)

LD_LIBRARY_PATH设置有问题吗?如果是,我该如何解决?我需要运行rJava来安装rhdfs.

更多信息(如果需要):

[cloudera@localhost ~]$ java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Run Code Online (Sandbox Code Playgroud)

java r centos rjava cloudera

7
推荐指数
2
解决办法
1万
查看次数

Cloudera CDH中的hadoop-examples*和hadoop-test*jars在哪里?

我正在寻找能够运行与示例和测试jar相关联的hadoop作业的jar文件.在过去,他们在/ usr/lib/hadoop下,但显然不再.指针赞赏.

注意:这个问题最初是针对CDH4.2的.但是一些答案包括更高版本的信息

hadoop mapreduce cloudera

7
推荐指数
2
解决办法
2万
查看次数

错误:140770FC:SSL例程:SSL23_GET_SERVER_HELLO:未知协议

我目前正在尝试使用Cloudera Hadoop环境中的Encrypted Shuffle测试已实现的更改以实现安全性.

我已经创建了证书和密钥库,并将它们保存在适当的位置.

我正在测试TaskTracker的HTTPS端口50060.

当我对该端口进行卷曲时,我得到以下错误响应.

ubuntu@node2:~$ curl -v -k "https://10.0.10.90:50060"
* About to connect() to 10.0.10.90 port 50060 (#0)
*   Trying 10.0.10.90... connected
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
* Closing connection #0
curl: (35) error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
Run Code Online (Sandbox Code Playgroud)

当我检查开放的ssl客户端时,我得到了以下响应

 ubuntu@node2:~$ openssl s_client -connect 10.0.10.90:50060
CONNECTED(00000003)
139749924464288:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:749:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake …
Run Code Online (Sandbox Code Playgroud)

java ssl hadoop openssl cloudera

7
推荐指数
1
解决办法
2万
查看次数

无法从Eclipse建立与Hive的JDBC连接

我正在尝试与Hive建立JDBC连接,以便我可以查看和创建表并从Eclipse查询Hive表.我使用了HiveClient示例代码:https://cwiki.apache.org/confluence/display/Hive/HiveClient 然后我将所有必需的jar添加到eclipse中的java构建路径并启动了Hive Thrift Server.端口10000正在侦听.我正在使用Cloudera QuickstartVM 4.6.1及其附带的eclipse.这是我尝试运行代码时在IDE中出现的错误.

Exception in thread "main" java.sql.SQLException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
    at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:191)
    at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:127)
    at org.apache.hadoop.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:108)
    at org.apache.hadoop.hive.jdbc.HiveConnection.<init>(HiveConnection.java:103)
    at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
    at java.sql.DriverManager.getConnection(DriverManager.java:582)
    at java.sql.DriverManager.getConnection(DriverManager.java:185)
    at jdbc.Hive.main(Hive.java:24)
Run Code Online (Sandbox Code Playgroud)

当我尝试使用beeline连接到Hive时,我得到了同样的错误.但是,当我从!connect命令中删除主机名和端口时,它会处理以下错误:

beeline> !connect jdbc:hive:// "" ""                 
scan complete in 4ms
Connecting to jdbc:hive://
14/03/21 18:42:03 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
14/03/21 18:42:03 INFO metastore.HiveMetaStore: …
Run Code Online (Sandbox Code Playgroud)

java hadoop hive jdbc cloudera

7
推荐指数
1
解决办法
3万
查看次数

Apache flume twitter代理不流数据

我正在尝试将Twitter提要流式传输到hdfs,然后使用配置单元.但第一部分,流数据和加载到hdfs不起作用,并给出Null指针异常.

这就是我尝试过的.

1.下载了apache-flume-1.4.0-bin.tar.提取它.将所有内容复制到/ usr/lib/flume /.在/ usr/lib/i中将所有者更改为flume目录的用户.当我做LS命令在/ usr/lib中/水槽/,它表明

bin  CHANGELOG  conf  DEVNOTES  docs  lib  LICENSE  logs  NOTICE  README  RELEASE-NOTES  tools
Run Code Online (Sandbox Code Playgroud)

2.移至conf /目录.我将文件复制flume-env.sh.templateflume-env.sh并将JAVA_HOME编辑为我的java路径/usr/lib/jvm/java-7-oracle.

3.接下来,我在同一目录中创建了一个名为flume.conf的文件,conf并添加了以下内容

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <Twitter Application API key>
TwitterAgent.sources.Twitter.consumerSecret = <Twitter Application API secret>
TwitterAgent.sources.Twitter.accessToken = <Twitter Application Access token>
TwitterAgent.sources.Twitter.accessTokenSecret = …
Run Code Online (Sandbox Code Playgroud)

java twitter hadoop flume cloudera

7
推荐指数
1
解决办法
6351
查看次数

在Cloudera中使用Storm

我一直在寻找使用Hortonworks 2.1安装的Storm,但为了避免在Cloudera安装(其中包含Spark)之外安装Hortonworks,我试图找到一种在Cloudera中使用Storm的方法.

如果可以在单个平台上同时使用Storm和Spark,那么它将节省在计算机上安装Cloudera和Hortonworks所需的额外资源.

cloudera apache-storm

7
推荐指数
1
解决办法
6825
查看次数

Namenode HA(UnknownHostException:nameservice1)

我们使用Cloudera Manager启用Namenode High Availability

Cloudera Manager >> HDFS >> Action> Enable High Availability >> Selected Stand By Namenode&Journal Nodes然后nameservice1

完成整个过程后,部署客户端配置.

通过列出HDFS目录(hadoop fs -ls /)从Client Machine进行测试,然后手动故障转移到备用namenode并再次列出HDFS目录(hadoop fs -ls /).这项测试完美无缺.

但是当我使用以下命令运行hadoop睡眠作业时它失败了

$ hadoop jar /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop-0.20-mapreduce/hadoop-examples.jar sleep -m 1 -r 0
java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:980)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:974)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:974) …
Run Code Online (Sandbox Code Playgroud)

hadoop hdfs cloudera cloudera-manager cloudera-cdh

7
推荐指数
1
解决办法
2万
查看次数