Rav*_*edi 21 java hadoop mapreduce
我一直在尝试从同一个包中的一个简单的java程序调用mapreduce作业.我试图在我的java程序中引用mapreduce jar文件并使用该runJar(String args[])方法调用它,同时传递mapreduce作业的输入和输出路径..但程序工作..
我如何运行这样一个程序,我只是使用传递输入,输出和jar路径到它的主要方法?是否可以通过它运行mapreduce作业(jar)?我想这样做是因为我希望一个接一个地运行几个mapreduce作业,我的java程序vl通过引用它的jar文件来调用每个这样的作业.如果这成为可能,我不妨只使用一个简单的servlet来做这样的调用并参考其输出文件以用于图表目的..
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
/**
*
* @author root
*/
import org.apache.hadoop.util.RunJar;
import java.util.*;
public class callOther {
public static void main(String args[])throws Throwable
{
ArrayList arg=new ArrayList();
String output="/root/Desktp/output";
arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar");
arg.add("/root/Desktop/input");
arg.add(output);
RunJar.main((String[])arg.toArray(new String[0]));
}
}
Run Code Online (Sandbox Code Playgroud)
Tho*_*lut 31
哦,请不要这样做runJar,Java API非常好.
了解如何从正常代码开始工作:
// create a configuration
Configuration conf = new Configuration();
// create a new job based on the configuration
Job job = new Job(conf);
// here you have to put your mapper class
job.setMapperClass(Mapper.class);
// here you have to put your reducer class
job.setReducerClass(Reducer.class);
// here you have to set the jar which is containing your
// map/reduce class, so you can use the mapper class
job.setJarByClass(Mapper.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// this is setting the format of your input, can be TextInputFormat
job.setInputFormatClass(SequenceFileInputFormat.class);
// same with output
job.setOutputFormatClass(TextOutputFormat.class);
// here you can set the path of your input
SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/"));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path("files/out/processed/");
fs.delete(out, true);
// finally set the empty out path
TextOutputFormat.setOutputPath(job, out);
// this waits until the job completes and prints debug out to STDOUT or whatever
// has been configured in your log4j properties.
job.waitForCompletion(true);
Run Code Online (Sandbox Code Playgroud)
如果您使用的是外部群集,则必须通过以下方式将以下信息放入配置中:
// this should be like defined in your mapred-site.xml
conf.set("mapred.job.tracker", "jobtracker.com:50001");
// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");
Run Code Online (Sandbox Code Playgroud)
当hadoop-core.jar在您的应用程序容器类路径中时,这应该没有问题.但我认为你应该在你的网页上加上某种进度指示器,因为完成一个hadoop工作可能需要几分钟到几个小时;)
对于YARN(> Hadoop 2)
对于YARN,需要设置以下配置.
// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001");
// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");
// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");
Run Code Online (Sandbox Code Playgroud)
小智 7
从java Web应用程序(Servlet)调用MapReduce作业
您可以使用Java API从Web应用程序调用MapReduce作业.这是从servlet调用MapReduce作业的一个小例子.步骤如下:
第1步:首先创建一个MapReduce驱动程序servlet类.还开发地图和减少服务.这是一个示例代码段:
CallJobFromServlet.java
public class CallJobFromServlet extends HttpServlet {
protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException {
Configuration conf = new Configuration();
// Replace CallJobFromServlet.class name with your servlet class
Job job = new Job(conf, " CallJobFromServlet.class");
job.setJarByClass(CallJobFromServlet.class);
job.setJobName("Job Name");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class
job.setNumReduceTasks(30);
job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// Job Input path
FileInputFormat.addInputPath(job, new
Path("hdfs://localhost:54310/user/hduser/input/"));
// Job Output path
FileOutputFormat.setOutputPath(job, new
Path("hdfs://localhost:54310/user/hduser/output"));
job.waitForCompletion(true);
}
}
Run Code Online (Sandbox Code Playgroud)
步骤2:将所有相关的jar(hadoop,特定于应用程序的jar)文件放在Web服务器的lib文件夹中(例如Tomcat).这对于访问Hadoop配置是必需的(hadoop'conf'文件夹具有配置xml文件,即core-site.xml,hdfs-site.xml等).只需将jar从hadoop lib文件夹复制到web服务器(tomcat)lib目录即可.jar名称列表如下:
1. commons-beanutils-1.7.0.jar
2. commons-beanutils-core-1.8.0.jar
3. commons-cli-1.2.jar
4. commons-collections-3.2.1.jar
5. commons-configuration-1.6.jar
6. commons-httpclient-3.0.1.jar
7. commons-io-2.1.jar
8. commons-lang-2.4.jar
9. commons-logging-1.1.1.jar
10. hadoop-client-1.0.4.jar
11. hadoop-core-1.0.4.jar
12. jackson-core-asl-1.8.8.jar
13. jackson-mapper-asl-1.8.8.jar
14. jersey-core-1.8.jar
Run Code Online (Sandbox Code Playgroud)
步骤3:将Web应用程序部署到Web服务器(在Tomcat的'webapps'文件夹中).
步骤4:创建一个jsp文件并在表单操作属性中链接servlet类(CallJobFromServlet.java).这是一个示例代码段:
的index.jsp
<form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet ">
<span class="back">Trigger Hadoop Job from Web Page </span>
<input type="submit" name="submit" value="Trigger Job" />
</form>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
33131 次 |
| 最近记录: |