标签: cloudera

Cloudera hadoop:无法运行Hadoop fs命令,同时HBase无法在HDFS上创建目录?

我有6个节点的cloudera 5.0 beta集群启动并运行

但我无法使用命令查看hadoop HDFS的文件和文件夹

sudo -u hdfs hadoop fs -ls /
Run Code Online (Sandbox Code Playgroud)

在输出中它显示linux目录的文件和文件夹.

虽然namenode UI显示文件和文件夹.

而在HDFS上创建文件夹时收到错误

sudo -u hdfs hadoop fs -mkdir /test
mkdir: `/test': Input/output error
Run Code Online (Sandbox Code Playgroud)

由于此错误,hbase未启动并因以下错误而关闭:

Unhandled exception. Starting shutdown.
java.io.IOException: Exception in makeDirOnFileSystem
at org.apache.hadoop.hbase.HBaseFileSystem.makeDirOnFileSystem(HBaseFileSystem.java:136)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:352)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:134)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:119)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:536)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4828)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4802)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3130)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3094)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3075)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:669)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:419) …
Run Code Online (Sandbox Code Playgroud)

hadoop hdfs cloudera

5
推荐指数
1
解决办法
9380
查看次数

Hive ORC 压缩

我在 hive v0.12.0 中运行以下代码,我希望使用不同的方法压缩三个表,因此文件的大小和内容应该不同

\n\n
--- Create table and compress it with ZLIB\ncreate table zzz_test_szlib\n  stored as orc\n  tblproperties ("orc.compress"="ZLIB")\n  as\nselect * from uk_pers_dev.orc_dib_trans limit 100000000;\n\n--- Create table and compress it with SNAPPY\ncreate table zzz_test_ssnap\n  stored as orc\n  tblproperties ("orc.compress"="SNAPPY")\n  as\nselect * from uk_pers_dev.orc_dib_trans limit 100000000;\n\n--- Create table and DO NOT compress it\ncreate table zzz_test_snone\n  stored as orc\n  tblproperties ("orc.compress"="NONE")\n  as\nselect * from uk_pers_dev.orc_dib_trans limit 100000000;
Run Code Online (Sandbox Code Playgroud)\n\n

当我使用描述或通过色调检查表元数据时,我得到:

\n\n
Name             Value                                            Value                                            Value\n---------------- …
Run Code Online (Sandbox Code Playgroud)

compression hadoop hive cloudera snappy

5
推荐指数
1
解决办法
8255
查看次数

客户端无法通过以下方式进行身份验证:[TOKEN, KERBEROS]

我正在使用 YarnClient 以编程方式启动作业。我正在运行的集群已进行 Kerberos 化。

通过“yarn jar Examples.jar wordcount...”提交的法线贴图缩减作业工作。

我尝试以编程方式提交的作业却没有。我收到此错误:

14/09/04 21:14:29 错误 client.ClientService:应用程序提交期间发生错误:应用程序 application_1409863263326_0002 由于 appattempt_1409863263326_0002_000002 的 AM 容器退出,退出代码:-1000,原因是:本地异常失败:java.io。 IOException:org.apache.hadoop.security.AccessControlException:客户端无法通过以下方式进行身份验证:[TOKEN,KERBEROS];主机详细信息:本地主机为:“yarn-c1-n1.clouddev.snaplogic.com/10.184.28.108”;目标主机是:“yarn-c1-cdh.clouddev.snaplogic.com”:8020;.这次尝试失败..申请失败。14/09/04 21:14:29 错误 client.YClient:应用程序提交失败

代码看起来像这样:

ClientContext context = createContextFrom(args);
YarnConfiguration configuration = new YarnConfiguration();
YarnClient yarnClient = YarnClient.createYarnClient();
yarnClient.init(configuration);
ClientService client = new ClientService(context, yarnClient, new InstallManager(FileSystem.get(configuration)));
LOG.info(Messages.RUNNING_CLIENT_SERVICE);
boolean result = client.execute();
Run Code Online (Sandbox Code Playgroud)

我原以为也许添加一些东西可以达到以下效果:

yarnClient.getRMDelegationToken(new Text(InetAddress.getLocalHost().getHostAddress()));
Run Code Online (Sandbox Code Playgroud)

也许可以减轻我的痛苦,但这似乎也没有帮助。任何帮助将不胜感激。

hadoop cloudera hadoop-yarn kerberos-delegation

5
推荐指数
1
解决办法
3万
查看次数

在mac os x上安装cloudera impala shell并连接到impala集群

我们在prod上有impala服务器,我需要使用impala shell连接到我的本地macbook w/mac os x(10.8).

我下载Impala-cdh5.1.0-release.tar.gz,取消归档,尝试buildall.sh失败:.../bin/impala-config.sh: line 123: nproc: command not found

impala-shell直接尝试也失败了:

$ shell/impala-shell 
ls: /Users/.../Impala-cdh5.1.0-release/shell/ext-py/*.egg: No such file or directory
 Traceback (most recent call last):
 File "/Users/.../Impala-cdh5.1.0-release/shell/impala_shell.py", line 20, in <module> 
 import prettytable
ImportError: No module named prettytable
Run Code Online (Sandbox Code Playgroud)

我安装了jdk并设置了JAVA_HOME.Cloudera经理似乎不支持mac os,不是吗?

macos shell cloudera impala

5
推荐指数
1
解决办法
1265
查看次数

使用Oozie(使用Hue)在YARN上运行Spark应用程序的正确方法是什么?

我在Scala中编写了一个使用Spark的应用程序.
该应用程序包含两个模块 - App包含具有不同逻辑的类的Env模块,以及包含环境和系统初始化代码的模块,以及实用程序功能.
入口点位于Env初始化之后,它在App(根据args,使用Class.forName)中创建一个类并执行逻辑.
模块被导出到2个不同的JAR(即env.jarapp.jar).

当我在本地运行应用程序时,它执行得很好.下一步是将应用程序部署到我的服务器.我使用Cloudera的CDH 5.4.

我使用Hue使用Spark任务创建一个新的Oozie工作流,其中包含以下参数:

  • Spark Master: yarn
  • 模式: cluster
  • 应用名称: myApp
  • Jars/py文件: lib/env.jar,lib/app.jar
  • 主类:( env.MainEnv模块中)
  • 参数: app.AggBlock1Task

然后我将2个JAR放在lib工作流文件夹(/user/hue/oozie/workspaces/hue-oozie-1439807802.48)中的文件夹中.

当我运行工作流时,它会抛出一个FileNotFoundException并且应用程序不会执行:

java.io.FileNotFoundException: File file:/cloudera/yarn/nm/usercache/danny/appcache/application_1439823995861_0029/container_1439823995861_0029_01_000001/lib/app.jar,lib/env.jar does not exist
Run Code Online (Sandbox Code Playgroud)

然而,当我离开星火主和模式参数为空,这一切工作正常,但是当我检查spark.master编程设置为local[*],而不是yarn.此外,在观察日志时,我在Oozie Spark动作配置下遇到了这个:

--master
null
--name
myApp
--class
env.Main
--verbose
lib/env.jar,lib/app.jar
app.AggBlock1Task
Run Code Online (Sandbox Code Playgroud)

我假设我做得不对 - 没有设置Spark master和mode参数并且运行应用程序spark.master设置为local[*].据我所知,SparkConf …

cloudera hue oozie hadoop-yarn apache-spark

5
推荐指数
1
解决办法
1811
查看次数

运行异常/etc/hadoop/conf.cloudera.yarn/topology.py

每当我尝试在Cloudera CDH 5.4.4集群上运行Spark应用程序,Yarn客户端模式时,我都会收到以下异常(在堆栈跟踪中重复多次).无论如何,该过程仍在继续(这是一个警告),但在日志中找到某些内容是不可能的.我该如何解决?

15/09/01 08:53:58 WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.0.0.5 
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/home/azureuser/scripts/streaming"): error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
    at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
    at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
    at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
    at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
    at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:271)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:263)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:263)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:167)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:131)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
    at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
    at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) …
Run Code Online (Sandbox Code Playgroud)

cloudera hadoop-yarn apache-spark

5
推荐指数
1
解决办法
7217
查看次数

hive.metastore.RetryingMetaStoreClient:MetaStoreClient丢失连接.尝试重新连接错误,触发配置单元上的其他错误

有时,当开发人员运行配置单元查询时,会出现以下错误(下面的第一个日志条目).

当我查看节点上的hive longs时,我看到Metastoreclient之前立即丢失了连接(下面的第二个日志条目).

问题似乎自行消失了.

知道这可能是什么原因?

谢谢!

hadoop-cmf-hive-HIVESERVER2-qn7bi02hdn001.compliant.disney.private.log.out.3:2016-04-27 07:17:20,092 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException org.apache.thrift.transport.TTransportException

[root@qn7bi02hdn001 rashm010]# grep lost *3
Run Code Online (Sandbox Code Playgroud)

2016-04-27 07:16:54,449 WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient:MetaStoreClient丢失连接.试图重新连接.2016-04-27 07:17:20,114 WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient:MetaStoreClient丢失了连接.试图重新连接.[root @ qn7bi02hdn001 rashm010]#

任何帮助将不胜感激.

hadoop hive cloudera hiveql

5
推荐指数
1
解决办法
2003
查看次数

hadoop,python,子进程失败,代码为127

我正在尝试使用mapreduce运行非常简单的任务。

mapper.py:

#!/usr/bin/env python
import sys
for line in sys.stdin:
    print line
Run Code Online (Sandbox Code Playgroud)

我的txt文件:

qwerty
asdfgh
zxc
Run Code Online (Sandbox Code Playgroud)

命令行运行作业:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper /home/cloudera/Documents/map.py \
-file /home/cloudera/Documents/map.py
Run Code Online (Sandbox Code Playgroud)

错误:

INFO mapreduce.Job: Task Id : attempt_1490617885665_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Run Code Online (Sandbox Code Playgroud)

如何解决此问题并运行代码?当我使用cat /home/cloudera/Documents/test.txt | python …

python hadoop mapreduce cloudera hadoop-streaming

5
推荐指数
2
解决办法
3783
查看次数

Yarn上的Oracle PGX-WebService上的404

我在Oracle BDA X7-2上运行Yarn,规格为:

  • Cloudera Enterprise 5.14.3
  • Java 1.8.0_171
  • PGX 2.7.1

我正在尝试按照以下手册在Yarn上运行PGX:https : //docs.oracle.com/cd/E56133_01/2.5.0/tutorials/yarn.html

托管运行安装脚本,并使用以下命令完成了它提供的配置文件:

{
  "pgx_yarn_jar_hdfs_path": "hdfs:/user/pgx/pgx-yarn-2.7.1.jar",
  "pgx_war_hdfs_path": "hdfs:/user/pgx/pgx-webapp-2.7.1.war",
  "pgx_conf_hdfs_path": "hdfs:/user/pgx/pgx.conf",
  "pgx_log4j_conf_hdfs_path": "hdfs:/user/pgx/log4j2.xml",
  "pgx_dist_log4j_conf_hdfs_path": "hdfs:/user/pgx/dist_log4j.xml",
  "pgx_cluster_host_hdfs_path": "hdfs:/user/pgx/cluster-host.tgz",
  "zookeeper_connect_string": "bda1node05,bda1node06,bda1node07",
  "standard_library_path": "/usr/lib64/gcc/4.8.2",
  "min_heap_size": "512m",
  "max_heap_size": "12g",
  "container_cores": 9,
  "container_memory": 0,
  "container_priority": 0,
  "num_machines": 1
}
Run Code Online (Sandbox Code Playgroud)

Yarn的pgx-service应用程序处于RUNNING状态,stderr中没有错误,日志显示该服务正在该地址运行:

http://bda1node06:7007

并通过以下命令运行linux Java进程:

/usr/java/default/bin/java -Xms512m -Xmx12g oracle.pgx.yarn.PgxService bda1node06 /u11/hadoop/yarn/nm/usercache/root/appcache/application_1539869144089_2070/container_e22_1539869144089_2070_01_000002/pgx-server.war 7007 bda1node05,bda1node06,bda1node07 /pgx-8eef44e2-1657-403a-8193-0102f5266680

出于测试目的,在执行PGX客户端后:

$PGX_HOME/bin/pgx --base_url http://bda1node06:7007

我得到:

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: cannot connect to server; requested http://bda1node06:7007/version?extendedInfo=true and expected status 200, got …
Run Code Online (Sandbox Code Playgroud)

oracle-spatial bigdata cloudera hadoop-yarn cloudera-manager

5
推荐指数
1
解决办法
144
查看次数

哪个发行版 CDH 与 HDP

我碰巧在 CDH 工作了很长时间(大约 1 年),现在打算重新开始。现在我们有 CDH、HDP 和 Hortonwork 被 Cloudera 收购。

  1. HDP 是否正在积极开发中?还是CDH正在积极开发?
  2. 我应该从哪个发行版开始?

cloudera hortonworks-data-platform cloudera-cdh cloudera-quickstart-vm hortonworks-sandbox

5
推荐指数
1
解决办法
1621
查看次数