我正在为我的项目使用maven.当我运行程序时,我收到此错误,因此我无法看到我的程序执行进度,尽管该程序正在产生预期的输出.
srimanth@srimanth-Inspiron-N5110:~/CCHD&CCHA/mangoes$ mvn exec:java -q -Dexec.mainClass=bananas.MapReduceColorCount -Dexec.args="hdfs://localhost:9000/users.avrofile hdfs://localhost:9000/pleaseatleastnow6"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
srimanth@srimanth-Inspiron-N5110:~/CCHD&CCHA/mangoes$
Run Code Online (Sandbox Code Playgroud)
这是我的pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>fruits</groupId>
<artifactId>mangoes</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Hadoop</name>
<description>Hadoop
Hadoop</description>
<dependencies>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.7.6</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.6</version>
<classifier>hadoop2</classifier>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-app</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.6.0</version> …Run Code Online (Sandbox Code Playgroud) 我在运行Ubuntu 14.04LTS的笔记本电脑上安装了hadoop 2.6.0.我通过运行start-all.sh启动了hadoop守护进程.但是当我键入jps时,只有4个正在运行
10545 SecondaryNameNode
10703 ResourceManager
11568 Jps
10831 NodeManager
Run Code Online (Sandbox Code Playgroud)
以前只有datanode没有运行所以我删除了tmp文件夹并再次创建它.现在namenode和datanode都没有运行.我还检查了50070和50075是否被任何其他进程使用,但没有使用它们的进程.
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 1000 52304 6129/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1000 70108 10545/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1000 50441 6129/java
tcp6 0 0 :::8033 :::* LISTEN 1000 70199 10703/java
tcp6 0 0 :::8040 :::* LISTEN 1000 74863 10831/java
tcp6 0 0 :::8042 :::* LISTEN 1000 71055 10831/java
tcp6 0 0 :::46573 :::* LISTEN 1000 74854 10831/java
tcp6 0 0 :::8088 :::* …Run Code Online (Sandbox Code Playgroud) 我知道SortComparator用于按键对地图输出进行排序.我编写了一个自定义的SortComparator来更好地理解MapReduce框架.这是我的自定义SortComparator类的WordCount类.
package bananas;
import java.io.FileWriter;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) …Run Code Online (Sandbox Code Playgroud) 我所知道的是YARN,它取代了JobTracker和TaskTracker.
我已经看到一些Hadoop 2.6.0/2.7.0安装教程,他们将配置mapreduce.framework.name为yarn和mapred.job.trackerproperty作为local或host:port.
mapred.job.tracker财产的描述是
"MapReduce作业跟踪器运行的主机和端口.如果是"本地",则作业将作为单个映射在进程中运行并减少任务."
我怀疑是为什么要配置它,如果我们使用YARN,我的意思是JobTracker不应该正常运行?
如果我的问题是愚蠢的,请原谅我.
编辑:这些是我正在谈论的教程.
http://chaalpritam.blogspot.in/2015/01/hadoop-260-multi-node-cluster-setup-on.html
http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/
https://chawlasumit.wordpress.com/2015/03/09/install-a-multi-node-hadoop-cluster-on-ubuntu-14-04/
所以我想尝试使用Hive MAP和REDUCE自定义mapper reducer查询.
我已经将自定义映射器和reducer编写并导出到jar文件,并尝试从Hive CLI添加它.无论我在哪里复制jar,我都会得到"不存在"的错误.我尝试了以下内容.
我将文件复制到/ usr/local/hive/lib /,/ usr/local/hive/conf /和/ tmp /然后在hdfs中我也将其复制到/,/ user/hive /和/ user/hive/warehouse /
我试着给出完整的路径然后我得到URL语法异常
hive> add jar 'hdfs://srimanthpc:9000/SpaceTravel.jar';
Illegal character in scheme name at index 0: 'hdfs://srimanthpc:9000/SpaceTravel.jar'
Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal character in scheme name at index 0: 'hdfs://srimanthpc:9000/SpaceTravel.jar'
hive> add jar 'file:///home/anil/Desktop/SpaceTravel.jar';
Illegal character in scheme name at index 0: 'file:///home/anil/Desktop/SpaceTravel.jar'
Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal character in scheme name at index 0: 'file:///home/anil/Desktop/SpaceTravel.jar'
Run Code Online (Sandbox Code Playgroud)
如果我给出没有任何架构的路径,它说它不存在.我尝试添加文件而不是添加jar. …