Rav*_*rma 4 java hadoop mapreduce
我是Hadoop的新手..我只是以独立模式运行我的hadoop应用程序.它运作得很好.我现在决定将其移至伪分布式模式.我提到了配置更改.显示我的xml文件的片段:
我的core-site.xml如下所示:
<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-onur</value>
<description>A base for other temporary directories.</description>
</property>
Run Code Online (Sandbox Code Playgroud)
我的hdfs-site.xml是
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Run Code Online (Sandbox Code Playgroud)
我的mapred.xml是
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
Run Code Online (Sandbox Code Playgroud)
我运行了start-dfs.sh和start-mapred.sh的脚本,它开始很好
root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-dfs.sh
starting namenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-vissu-desktop.out
localhost: starting datanode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-vissu-desktop.out
localhost: starting secondarynamenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-mapred.sh
starting jobtracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-vissu-desktop.out
localhost: starting tasktracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop#
Run Code Online (Sandbox Code Playgroud)
现在我试图运行我的应用程序:但得到以下错误.
root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2# hadoop jar ResultAgg_plainjar.jar ProcessInputFile /home/vissu/Raveesh/VotingConfiguration/sample.txt
ARG 0 obtained = ProcessInputFile
12/07/17 17:43:33 INFO preprocessing.ProcessInputFile: Modified File Name is /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
Going to process map reduce jobs
12/07/17 17:43:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/17 17:43:34 ERROR preprocessing.ProcessInputFile: Input path does not exist: hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2#
Run Code Online (Sandbox Code Playgroud)
应用程序最初从路径接收文件然后修改它并创建sample.txt_modf,此文件必须由map reduce框架使用.当在独立模式下运行时,我已经给出了绝对路径,因此它很好.但我无法弄清楚应该在路径api中为hadoop指定的路径是什么.如果我给文件它添加了hdfs:// localhost/..所以我不确定如何在路径中给出伪分布式模式..我应该只是确保在该位置创建修改后的文件.
我的询问是关于如何提及路径..
包含路径的片段是
KeyValueTextInputFormat.addInputPath(conf,
new Path(System.getProperty("user.dir")+File.separator+inputFileofhits.getName()));
FileOutputFormat.setOutputPath(
conf,
new Path(ProcessInputFile.resultAggProps
.getProperty("OUTPUT_DIRECTORY")));
Run Code Online (Sandbox Code Playgroud)
谢谢
此文件是否存在于HDFS中?看起来您已经提供了文件的本地路径(HDFS中的用户目录通常以/ user而不是/ home为根.
您可以通过键入以下内容检查HDFS中是否存在该文件:
#> hadoop fs -ls hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
Run Code Online (Sandbox Code Playgroud)
如果没有返回任何内容,即文件不在HDFS中,则可以使用hadoop fs命令再次复制到HDFS:
#> hadoop fs -put /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf hdfs://localhost/user/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
Run Code Online (Sandbox Code Playgroud)
请注意,HDFS中的路径根植于/ user,而不是/ home.