我有6个节点的cloudera 5.0 beta集群启动并运行
但我无法使用命令查看hadoop HDFS的文件和文件夹
sudo -u hdfs hadoop fs -ls /
Run Code Online (Sandbox Code Playgroud)
在输出中它显示linux目录的文件和文件夹.
虽然namenode UI显示文件和文件夹.
而在HDFS上创建文件夹时收到错误
sudo -u hdfs hadoop fs -mkdir /test
mkdir: `/test': Input/output error
Run Code Online (Sandbox Code Playgroud)
由于此错误,hbase未启动并因以下错误而关闭:
Unhandled exception. Starting shutdown.
java.io.IOException: Exception in makeDirOnFileSystem
at org.apache.hadoop.hbase.HBaseFileSystem.makeDirOnFileSystem(HBaseFileSystem.java:136)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:352)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:134)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:119)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:536)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4828)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4802)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3130)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3094)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3075)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:669)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:419) …Run Code Online (Sandbox Code Playgroud) 我在 hive v0.12.0 中运行以下代码,我希望使用不同的方法压缩三个表,因此文件的大小和内容应该不同。
\n\n--- Create table and compress it with ZLIB\ncreate table zzz_test_szlib\n stored as orc\n tblproperties ("orc.compress"="ZLIB")\n as\nselect * from uk_pers_dev.orc_dib_trans limit 100000000;\n\n--- Create table and compress it with SNAPPY\ncreate table zzz_test_ssnap\n stored as orc\n tblproperties ("orc.compress"="SNAPPY")\n as\nselect * from uk_pers_dev.orc_dib_trans limit 100000000;\n\n--- Create table and DO NOT compress it\ncreate table zzz_test_snone\n stored as orc\n tblproperties ("orc.compress"="NONE")\n as\nselect * from uk_pers_dev.orc_dib_trans limit 100000000;Run Code Online (Sandbox Code Playgroud)\n\n当我使用描述或通过色调检查表元数据时,我得到:
\n\nName Value Value Value\n---------------- …Run Code Online (Sandbox Code Playgroud) 我正在使用 YarnClient 以编程方式启动作业。我正在运行的集群已进行 Kerberos 化。
通过“yarn jar Examples.jar wordcount...”提交的法线贴图缩减作业工作。
我尝试以编程方式提交的作业却没有。我收到此错误:
14/09/04 21:14:29 错误 client.ClientService:应用程序提交期间发生错误:应用程序 application_1409863263326_0002 由于 appattempt_1409863263326_0002_000002 的 AM 容器退出,退出代码:-1000,原因是:本地异常失败:java.io。 IOException:org.apache.hadoop.security.AccessControlException:客户端无法通过以下方式进行身份验证:[TOKEN,KERBEROS];主机详细信息:本地主机为:“yarn-c1-n1.clouddev.snaplogic.com/10.184.28.108”;目标主机是:“yarn-c1-cdh.clouddev.snaplogic.com”:8020;.这次尝试失败..申请失败。14/09/04 21:14:29 错误 client.YClient:应用程序提交失败
代码看起来像这样:
ClientContext context = createContextFrom(args);
YarnConfiguration configuration = new YarnConfiguration();
YarnClient yarnClient = YarnClient.createYarnClient();
yarnClient.init(configuration);
ClientService client = new ClientService(context, yarnClient, new InstallManager(FileSystem.get(configuration)));
LOG.info(Messages.RUNNING_CLIENT_SERVICE);
boolean result = client.execute();
Run Code Online (Sandbox Code Playgroud)
我原以为也许添加一些东西可以达到以下效果:
yarnClient.getRMDelegationToken(new Text(InetAddress.getLocalHost().getHostAddress()));
Run Code Online (Sandbox Code Playgroud)
也许可以减轻我的痛苦,但这似乎也没有帮助。任何帮助将不胜感激。
我们在prod上有impala服务器,我需要使用impala shell连接到我的本地macbook w/mac os x(10.8).
我下载Impala-cdh5.1.0-release.tar.gz,取消归档,尝试buildall.sh失败:.../bin/impala-config.sh: line 123: nproc: command not found
impala-shell直接尝试也失败了:
$ shell/impala-shell
ls: /Users/.../Impala-cdh5.1.0-release/shell/ext-py/*.egg: No such file or directory
Traceback (most recent call last):
File "/Users/.../Impala-cdh5.1.0-release/shell/impala_shell.py", line 20, in <module>
import prettytable
ImportError: No module named prettytable
Run Code Online (Sandbox Code Playgroud)
我安装了jdk并设置了JAVA_HOME.Cloudera经理似乎不支持mac os,不是吗?
我在Scala中编写了一个使用Spark的应用程序.
该应用程序包含两个模块 - App包含具有不同逻辑的类的Env模块,以及包含环境和系统初始化代码的模块,以及实用程序功能.
入口点位于Env初始化之后,它在App(根据args,使用Class.forName)中创建一个类并执行逻辑.
模块被导出到2个不同的JAR(即env.jar和app.jar).
当我在本地运行应用程序时,它执行得很好.下一步是将应用程序部署到我的服务器.我使用Cloudera的CDH 5.4.
我使用Hue使用Spark任务创建一个新的Oozie工作流,其中包含以下参数:
yarnclustermyApplib/env.jar,lib/app.jarenv.Main在Env模块中)app.AggBlock1Task然后我将2个JAR放在lib工作流文件夹(/user/hue/oozie/workspaces/hue-oozie-1439807802.48)中的文件夹中.
当我运行工作流时,它会抛出一个FileNotFoundException并且应用程序不会执行:
java.io.FileNotFoundException: File file:/cloudera/yarn/nm/usercache/danny/appcache/application_1439823995861_0029/container_1439823995861_0029_01_000001/lib/app.jar,lib/env.jar does not exist
Run Code Online (Sandbox Code Playgroud)
然而,当我离开星火主和模式参数为空,这一切工作正常,但是当我检查spark.master编程设置为local[*],而不是yarn.此外,在观察日志时,我在Oozie Spark动作配置下遇到了这个:
--master
null
--name
myApp
--class
env.Main
--verbose
lib/env.jar,lib/app.jar
app.AggBlock1Task
Run Code Online (Sandbox Code Playgroud)
我假设我做得不对 - 没有设置Spark master和mode参数并且运行应用程序spark.master设置为local[*].据我所知,SparkConf …
每当我尝试在Cloudera CDH 5.4.4集群上运行Spark应用程序,Yarn客户端模式时,我都会收到以下异常(在堆栈跟踪中重复多次).无论如何,该过程仍在继续(这是一个警告),但在日志中找到某些内容是不可能的.我该如何解决?
15/09/01 08:53:58 WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.0.0.5
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/home/azureuser/scripts/streaming"): error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:271)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:263)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:263)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:167)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:131)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) …Run Code Online (Sandbox Code Playgroud) 有时,当开发人员运行配置单元查询时,会出现以下错误(下面的第一个日志条目).
当我查看节点上的hive longs时,我看到Metastoreclient之前立即丢失了连接(下面的第二个日志条目).
问题似乎自行消失了.
知道这可能是什么原因?
谢谢!
hadoop-cmf-hive-HIVESERVER2-qn7bi02hdn001.compliant.disney.private.log.out.3:2016-04-27 07:17:20,092 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException org.apache.thrift.transport.TTransportException
[root@qn7bi02hdn001 rashm010]# grep lost *3
Run Code Online (Sandbox Code Playgroud)
2016-04-27 07:16:54,449 WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient:MetaStoreClient丢失连接.试图重新连接.2016-04-27 07:17:20,114 WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient:MetaStoreClient丢失了连接.试图重新连接.[root @ qn7bi02hdn001 rashm010]#
任何帮助将不胜感激.
我正在尝试使用mapreduce运行非常简单的任务。
mapper.py:
#!/usr/bin/env python
import sys
for line in sys.stdin:
print line
Run Code Online (Sandbox Code Playgroud)
我的txt文件:
qwerty
asdfgh
zxc
Run Code Online (Sandbox Code Playgroud)
命令行运行作业:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper /home/cloudera/Documents/map.py \
-file /home/cloudera/Documents/map.py
Run Code Online (Sandbox Code Playgroud)
错误:
INFO mapreduce.Job: Task Id : attempt_1490617885665_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Run Code Online (Sandbox Code Playgroud)
如何解决此问题并运行代码?当我使用cat /home/cloudera/Documents/test.txt | python …
我在Oracle BDA X7-2上运行Yarn,规格为:
我正在尝试按照以下手册在Yarn上运行PGX:https : //docs.oracle.com/cd/E56133_01/2.5.0/tutorials/yarn.html
托管运行安装脚本,并使用以下命令完成了它提供的配置文件:
{
"pgx_yarn_jar_hdfs_path": "hdfs:/user/pgx/pgx-yarn-2.7.1.jar",
"pgx_war_hdfs_path": "hdfs:/user/pgx/pgx-webapp-2.7.1.war",
"pgx_conf_hdfs_path": "hdfs:/user/pgx/pgx.conf",
"pgx_log4j_conf_hdfs_path": "hdfs:/user/pgx/log4j2.xml",
"pgx_dist_log4j_conf_hdfs_path": "hdfs:/user/pgx/dist_log4j.xml",
"pgx_cluster_host_hdfs_path": "hdfs:/user/pgx/cluster-host.tgz",
"zookeeper_connect_string": "bda1node05,bda1node06,bda1node07",
"standard_library_path": "/usr/lib64/gcc/4.8.2",
"min_heap_size": "512m",
"max_heap_size": "12g",
"container_cores": 9,
"container_memory": 0,
"container_priority": 0,
"num_machines": 1
}
Run Code Online (Sandbox Code Playgroud)
Yarn的pgx-service应用程序处于RUNNING状态,stderr中没有错误,日志显示该服务正在该地址运行:
http://bda1node06:7007
并通过以下命令运行linux Java进程:
/usr/java/default/bin/java -Xms512m -Xmx12g oracle.pgx.yarn.PgxService bda1node06 /u11/hadoop/yarn/nm/usercache/root/appcache/application_1539869144089_2070/container_e22_1539869144089_2070_01_000002/pgx-server.war 7007 bda1node05,bda1node06,bda1node07 /pgx-8eef44e2-1657-403a-8193-0102f5266680
出于测试目的,在执行PGX客户端后:
$PGX_HOME/bin/pgx --base_url http://bda1node06:7007
我得到:
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: cannot connect to server; requested http://bda1node06:7007/version?extendedInfo=true and expected status 200, got …Run Code Online (Sandbox Code Playgroud) oracle-spatial bigdata cloudera hadoop-yarn cloudera-manager
我碰巧在 CDH 工作了很长时间(大约 1 年),现在打算重新开始。现在我们有 CDH、HDP 和 Hortonwork 被 Cloudera 收购。
cloudera hortonworks-data-platform cloudera-cdh cloudera-quickstart-vm hortonworks-sandbox
cloudera ×10
hadoop ×5
hadoop-yarn ×4
apache-spark ×2
hive ×2
bigdata ×1
cloudera-cdh ×1
compression ×1
hdfs ×1
hiveql ×1
hue ×1
impala ×1
macos ×1
mapreduce ×1
oozie ×1
python ×1
shell ×1
snappy ×1