Hadoop的Maven依赖项:MiniDFSCluster和MiniMRCluster

Amr*_*Amr 7 unit-testing hadoop hadoop2

我想实现一个maven项目,这有助于我对Hadoop MapReduce作业进行单元测试.我最大的问题是定义Maven依赖项以便能够使用测试类:MiniDFSCluster和MiniMRCluster.

我正在使用Hadoop 2.4.1.有任何想法吗?

ucs*_*nil 6

如果其他人仍然在搜索答案:

MiniMRCluster现已弃用.

您可以在依赖项中获取MiniDFSCluster和MiniMRCluster(显示为Gradle)

compile group: 'org.apache.hadoop', name: 'hadoop-minicluster', version: '2.7.2'
Run Code Online (Sandbox Code Playgroud)

依赖性基本上只是一个pom文件,列出了这个包中的依赖项.对于那些想要查看的人,MiniDFSCluster就在神器中hadoop-hdfs:tests

您不必使用Cloudera存储库中的依赖项


Amr*_*Amr 3

我想我已经明白了。在 Maven pom 文件中,首先添加一个新的存储库:

<repositories>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>
Run Code Online (Sandbox Code Playgroud)

然后将以下内容添加到您的项目依赖项中

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.1</version>
</dependency>
<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-auth</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-test</artifactId>
    <version>2.0.0-mr1-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.3.0</version>
    <classifier>tests</classifier>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.3.0</version>
    <classifier>tests</classifier>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>2.0.0-mr1-cdh4.3.0</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)

如果有人有兴趣获得整个项目(著名的 WordCount MapReduce 作业的单元测试,我愿意分享它)

  • 仅包含 hadoop-minicluster 就足够了: `&lt;dependency&gt;&lt;groupId&gt;org.apache.hadoop&lt;/groupId&gt;&lt;artifactId&gt;hadoop-minicluster&lt;/artifactId&gt;&lt;version&gt;2.7.0&lt;/version&gt;&lt;/dependency&gt; ` (4认同)