通过Web界面跟踪Hadoop作业状态?(将Hadoop公开给公司内部客户)

Era*_*mpf 3 hadoop

我想开发一个网站,允许公司内的分析师运行Hadoop作业(从一组定义的工作中选择)并查看他们的工作状态\进度.

有没有一种简单的方法可以通过Ruby\Python做到这一点(获得正在运行的作业状态等)?您如何将Hadoop集群公开给公司的内部客户?

din*_*eco 5

我找到了一种在JobTracker上获取有关工作的信息的方法.这是代码:

    Configuration conf = new Configuration();
    conf.set("mapred.job.tracker", "URL");

    JobClient client = new JobClient(new JobConf(conf));

    JobStatus[] jobStatuses = client.getAllJobs();
    for (JobStatus jobStatus : jobStatuses) {

        long lastTaskEndTime = 0L;

        TaskReport[] mapReports = client.getMapTaskReports(jobStatus.getJobID());
        for (TaskReport r : mapReports) {
            if (lastTaskEndTime < r.getFinishTime()) {
                lastTaskEndTime = r.getFinishTime();
            }
        }

        TaskReport[] reduceReports = client.getReduceTaskReports(jobStatus.getJobID());
        for (TaskReport r : reduceReports) {
            if (lastTaskEndTime < r.getFinishTime()) {
                lastTaskEndTime = r.getFinishTime();
            }
        }
        client.getSetupTaskReports(jobStatus.getJobID());
        client.getCleanupTaskReports(jobStatus.getJobID());

        System.out.println("JobID: " + jobStatus.getJobID().toString() + 
                            ", username: " + jobStatus.getUsername() + 
                            ", startTime: " + jobStatus.getStartTime() + 
                            ", endTime: " + lastTaskEndTime + 
                            ", Durration: " + (lastTaskEndTime - jobStatus.getStartTime()));

    }
Run Code Online (Sandbox Code Playgroud)