找到许多运行map-reduce程序的选项.任何人都可以解释以下命令之间的差异.如果有的话,会对Map减少工作产生什么影响.
java -jar MyMapReduce.jar [args]
hadoop jar MyMapReduce.jar [args]
yarn jar MyMapReduce.jar [args]
Run Code Online (Sandbox Code Playgroud)
在这些命令中哪一个最好还是其他?
可以使用下面的命令使用Web服务8088(YARN)的端口在Web服务正常上使用Yarn和Job History(如显示Hadoop和yarn命令)显示有关作业的所有信息的配置吗?
java -jar MyMapReduce.jar [args]
Run Code Online (Sandbox Code Playgroud) 我使用的是Spark 1.2.1,Hbase 0.98.10和Hadoop 2.6.0.从hbase检索数据时,我得到了一个零点异常.在下面找到堆栈跟踪.
[sparkDriver-akka.actor.default-dispatcher-2] DEBUG NewHadoopRDD - 无法使用InputSplit#getLocationInfo.java.lang.NullPointerException:在scala.collection.mutable的scala.collection.mutable.ArrayOps $ ofRef $ .length $ extension(ArrayOps.scala:114)〜[scala-library-2.10.4.jar:na]中为null.在Scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:32)〜[scala-library-2.10]的ArrayOps $ ofRef.length(ArrayOps.scala:114)〜[scala-library-2.10.4.jar:na] .4.jar:na] at org.apache.spark.rdd.HadoopRDD的scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:108)〜[scala-library-2.10.4.jar:na] $ .convertSplitLocationInfo(HadoopRDD.scala:401)〜[spark-core_2.10-1.2.1.jar:1.2.1]在org.apache.spark.rdd.NewHadoopRDD.getPreferredLocations(NewHadoopRDD.scala:215)〜[spark -core_2.10-1.2.1.jar:1.2.1]在org.apache.spark.rdd.RDD $$ anonfun $ preferredLocations $ 2.apply(RDD.scala:234)[spark-core_2.10-1.2.1 .jar:1.2.1]在org.apache.spark.rdd.RDD $$ anonfun $ preferredLocations $ 2.apply(RDD.scala:234)[spark-core_2.10-1.2.1.jar:1.2.1] at at scala.Option.getOrElse(Option.scala:120)[scala-library-2.10.4.jar:na] 在org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:233)[spark-core_2.10-1.2.1.jar:1.2.1] org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal(DAGScheduler.scala:1326)[spark-core_2.10-1.2.1.jar:1.2.1] org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2 $$ anonfun $ apply $ 2.apply $ mcVI …
我使用Java编写Map reduce Job.设置配置
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
configuration.set("mapreduce.job.tracker", "localhost:54311");
configuration.set("mapreduce.framework.name", "yarn");
configuration.set("yarn.resourcemanager.address", "localhost:8032");
Run Code Online (Sandbox Code Playgroud)
使用不同的案例运行
案例1:"使用Hadoop和Yarn命令":成功完成工作
案例2:"使用Eclipse":成功做好工作
案例3:"删除所有configuration.set()后使用Java -jar":
Configuration configuration = new Configuration();
Run successful but not display Job status on Yarn (default port number 8088)
Run Code Online (Sandbox Code Playgroud)
案例4:"使用Java -jar":错误
Find stack trace:Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at …Run Code Online (Sandbox Code Playgroud) hadoop ×3
mapreduce ×3
hadoop-yarn ×2
apache-spark ×1
eclipse ×1
hadoop2 ×1
hbase ×1
hdfs ×1
java ×1
scala ×1