我想开发一个网站,允许公司内的分析师运行Hadoop作业(从一组定义的工作中选择)并查看他们的工作状态\进度.
有没有一种简单的方法可以通过Ruby\Python做到这一点(获得正在运行的作业状态等)?您如何将Hadoop集群公开给公司的内部客户?
我找到了一种在JobTracker上获取有关工作的信息的方法.这是代码:
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "URL");
JobClient client = new JobClient(new JobConf(conf));
JobStatus[] jobStatuses = client.getAllJobs();
for (JobStatus jobStatus : jobStatuses) {
long lastTaskEndTime = 0L;
TaskReport[] mapReports = client.getMapTaskReports(jobStatus.getJobID());
for (TaskReport r : mapReports) {
if (lastTaskEndTime < r.getFinishTime()) {
lastTaskEndTime = r.getFinishTime();
}
}
TaskReport[] reduceReports = client.getReduceTaskReports(jobStatus.getJobID());
for (TaskReport r : reduceReports) {
if (lastTaskEndTime < r.getFinishTime()) {
lastTaskEndTime = r.getFinishTime();
}
}
client.getSetupTaskReports(jobStatus.getJobID());
client.getCleanupTaskReports(jobStatus.getJobID());
System.out.println("JobID: " + jobStatus.getJobID().toString() +
", username: " + jobStatus.getUsername() +
", startTime: " + jobStatus.getStartTime() +
", endTime: " + lastTaskEndTime +
", Durration: " + (lastTaskEndTime - jobStatus.getStartTime()));
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4719 次 |
| 最近记录: |