标签: oozie

oozie如何处理依赖关系?

我有几个关于oozie 2.3共享库的问题:

目前,我在coordinator.properties中定义了共享库:

oozie.use.system.libpath=true 
oozie.libpath=<hdfs_path>
Run Code Online (Sandbox Code Playgroud)

这是我的问题:

  1. 共享库被复制到其他数据节点时,有多少数据节点将获得共享库?

  2. 共享库是根据协调器作业中的wf数复制到其他数据节点还是每个协调器作业只复制一次?

hadoop oozie oozie-coordinator

9
推荐指数
1
解决办法
5359
查看次数

Oozie作业错误 - java.io.IOException:未指定配置

我为hive脚本创建了一个oozie工作流,以便在表中加载数据.

我的workflow.xml包含 -

<workflow-app xmlns="uri:oozie:workflow:0.4" name="Hive-Table-Insertion">
  <start to="InsertData"/>

  <action name="InsertData">
    <hive xmlns="uri:oozie:hive-action:0.4">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${workflowRoot}/output-data/hive"/>
        <mkdir path="${workflowRoot}/output-data"/>
      </prepare>
      <job-xml>${workflowRoot}/hive-site.xml</job-xml>
      <configuration>
        <property>
          <name>oozie.hive.defaults</name>
          <value>${workflowRoot}/hive-site.xml</value>
        </property>
      </configuration>
      <script>load_data.hql</script>
    </hive>
    <ok to="end"/>
    <error to="fail"/>
  </action>

  <kill name="fail">
    <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name="end"/>
</workflow-app>
Run Code Online (Sandbox Code Playgroud)

我的job.properties文件包含 -

nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
workflowRoot=HiveLoadData
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.wf.application.path=${nameNode}/user/${user.name}/${workflowRoot}
Run Code Online (Sandbox Code Playgroud)

当我尝试使用命令"oozie job -oozie http:// localhost:11000 / oozie -config /user/oozie/HiveLoadData/job.properties -submit"提交我的工作时,我收到以下错误,

java.io.IOException: configuration is not specified
        at org.apache.oozie.cli.OozieCLI.getConfiguration(OozieCLI.java:729)
        at org.apache.oozie.cli.OozieCLI.jobCommand(OozieCLI.java:879)
        at org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:604)
        at org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:577)
        at org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:204)
configuration is not …
Run Code Online (Sandbox Code Playgroud)

hadoop hdfs oozie

9
推荐指数
1
解决办法
1万
查看次数

Oozie SSH动作

Oozie SSH行动问题:

问题: 我们正在尝试在集群的特定主机上运行少量命令.我们选择了SSH Action.我们一直面临这个SSH问题.这可能是什么真正的问题?请指出我的解决方案.

日志:

AUTH_FAILED:无法执行操作[ssh -o PasswordAuthentication = no -o KbdInteractiveDevices = no -o StrictHostKeyChecking = no -o ConnectTimeout = 20 USER@1.2.3.4 mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1 - ssh /] | ErrorStream:警告:永久性地将主机1.2.3.4(RSA)添加到已知主机列表中.权限被拒绝(publickey,gssapi-keyex,gssapi-with-mic,密码).

org.apache.oozie.action.ActionExecutorException:AUTH_FAILED:无法执行操作[ssh -o PasswordAuthentication = no -o KbdInteractiveDevices = no -o StrictHostKeyChecking = no -o ConnectTimeout = 20 user@1.2.3.4 mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1 - ssh /] | ErrorStream:警告:永久性地将1.2.3.4,192.168.34.208(RSA)添加到已知主机列表中.权限被拒绝(publickey,gssapi-keyex,gssapi-with-mic,密码).

at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:589)
at org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:204)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59)
at org.apache.oozie.command.XCommand.call(XCommand.java:277)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) …
Run Code Online (Sandbox Code Playgroud)

hadoop oozie

8
推荐指数
1
解决办法
7923
查看次数

Hadoop作业失败,资源管理器不识别AttemptID

我试图在Oozie工作流程中聚合一些数据.但是聚合步骤失败.

我在日志中发现了两个兴趣点:第一个是错误(?)似乎反复出现:

容器完成后,它会被终止,但退出时返回非零退出代码143.

它结束了:

2015-05-04 15:35:12,013 INFO [IPC Server handler 7 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000048_0 is : 0.7231312
2015-05-04 15:35:12,015 INFO [IPC Server handler 19 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000048_0 is : 1.0
Run Code Online (Sandbox Code Playgroud)

然后当它被Application Master杀死时:

2015-05-04 15:35:13,831 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1430730089455_0009_m_000048_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Run Code Online (Sandbox Code Playgroud)

第二个兴趣点是完全崩溃工作的实际错误,这发生在reduce阶段,不确定这两个是否相关:

2015-05-04 15:35:28,767 INFO [IPC Server handler 20 …
Run Code Online (Sandbox Code Playgroud)

hadoop mapreduce oozie

8
推荐指数
1
解决办法
6573
查看次数

通过实施高级作业控制框架来帮助链接多个Map-Reduce作业意味着什么?

我对Hadoop很新,我目前已经分配了一个项目

"实施高级作业控制框架,以帮助链接多个Map-Reduce作业,即调查/改进现有的org.apache.hadoop.mapred.jobcontrol包."

该项目在http://wiki.apache.org/hadoop/ProjectSuggestions#research_projects上随机创意下的项目建议页面上列出

我的困惑是,我是否必须构建Oozie的高级版本(我认为这是一个链接多个工作的工作控制框架)或类似的东西,或者这意味着完全不同的东西.

我错过了什么?

hadoop mapreduce oozie

7
推荐指数
1
解决办法
537
查看次数

IOException:运行oozie工作流时,Filesystem已关闭异常

我们正在oozie中运行工作流程.它包含两个操作:第一个是在hdfs中生成文件的map reduce作业,第二个是应该将文件中的数据复制到数据库的作业.

这两个部分都已成功完成,但是oozie在结尾处抛出异常,将其标记为失败的进程.

这是例外:

2014-05-20 17:29:32,242 ERROR org.apache.hadoop.security.UserGroupInformation:   PriviledgedActionException as:lpinsight (auth:SIMPLE) cause:java.io.IOException: Filesystem   closed
2014-05-20 17:29:32,243 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Filesystem closed
    at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
    at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:589)
    at java.io.FilterInputStream.close(FilterInputStream.java:155)
    at org.apache.hadoop.util.LineReader.close(LineReader.java:149)
    at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:243)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:222)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:421)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Run Code Online (Sandbox Code Playgroud)

2014-05-20 17:29:32,256 INFO org.apache.hadoop.mapred.Task:Runnning cleanup for the task

任何的想法 ?

hadoop oozie

7
推荐指数
2
解决办法
1万
查看次数

在OOZIE-4.1.0中运行多个工作流时出错

我按照http://gauravkohli.com/2014/08/26/apache-oozie-installation-on-hadoop-2-4-1/中的步骤在Linux机器上 安装了oozie 4.1.0

hadoop version - 2.6.0 
maven - 3.0.4 
pig - 0.12.0
Run Code Online (Sandbox Code Playgroud)

群集设置 -

MASTER NODE runnig - Namenode,Resourcemanager,proxyserver.

SLAVE NODE正在运行 -Datanode,Nodemanager.

当我运行单个工作流程时,工作意味着它成功.但是当我尝试运行多个Workflow作业时,即两个作业都处于接受状态 在此输入图像描述

检查错误日志,我深入研究了问题,

014-12-24 21:00:36,758 [JobControl] INFO  org.apache.hadoop.ipc.Client  - Retrying connect to server: 172.16.***.***/172.16.***.***:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-12-25 09:30:39,145 [communication thread] INFO  org.apache.hadoop.ipc.Client  - Retrying connect to server: 172.16.***.***/172.16.***.***:52406. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-12-25 09:30:39,199 [communication thread] INFO  org.apache.hadoop.mapred.Task  - Communication exception: …
Run Code Online (Sandbox Code Playgroud)

java hadoop mapreduce oozie oozie-coordinator

7
推荐指数
1
解决办法
3566
查看次数

oozie在哪里存储捕获的Java动作(或)任何动作的输出值

我正在为我的Java Action使用capture-output选项.我在下游操作中使用的值.哪个工作正常.当我执行oozie作业时,框架也会获取值,而不再运行Java操作.

我想知道这些值存储在哪里?

提前致谢.

hadoop oozie

7
推荐指数
1
解决办法
2543
查看次数

Oozie shell动作未作为提交用户运行

我编写了一个Oozie工作流,它运行BASH shell脚本来执行一些配置单元查询并对结果执行一些操作.该脚本运行但在访问某些HDFS数据时会引发权限错误.提交Oozie工作流的用户具有权限,但脚本作为yarn用户运行.

是否可以让Oozie以提交工作流程的用户身份执行脚本?Hive和Java操作都作为提交的用户执行,只是shell的行为不同.

这是我的Oozie动作的粗略轮廓

<action name="start_action"
        retry-max="12"
        retry-interval="600">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <job-xml>${WorkflowRoot}/hive-site.xml</job-xml>
        <exec>script.sh</exec>
        <file>${WorkflowRoot}/script.sh</file>
        <capture-output />
    </shell>
    <ok to="next_action"/>
    <error to="send_email"/>
</action>
Run Code Online (Sandbox Code Playgroud)

我正在运行Oozie 4.1.0和HDP 2.1.

shell hadoop oozie

7
推荐指数
1
解决办法
1306
查看次数

Hive内部错误:java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)

我正在使用hue运行一个hive查询throwh oozie ..
我正在通过hue-oozie工作流程创建一个表...
我的工作失败但是当我在hive中检查时表创建了.
日志显示以下错误:

16157 [main] INFO  org.apache.hadoop.hive.ql.hooks.ATSHook  - Created ATS Hook
2015-09-24 11:05:35,801 INFO  [main] hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver  - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
2015-09-24 11:05:35,803 ERROR [main] ql.Driver (SessionState.java:printError(960)) - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)   
Run Code Online (Sandbox Code Playgroud)

无法识别问题....
我使用HDP 2.3.1

hadoop hive hue oozie

7
推荐指数
1
解决办法
4154
查看次数

标签 统计

hadoop ×10

oozie ×10

mapreduce ×3

oozie-coordinator ×2

hdfs ×1

hive ×1

hue ×1

java ×1

shell ×1