小编ali*_*n01的帖子

Hadoop java.io.IOException:Mkdirs无法创建/ some/path

当我尝试运行我的Job时,我收到以下异常:

Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/path
    at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:106)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:150)
Run Code Online (Sandbox Code Playgroud)

/ some/path的位置是hadoop.tmp.dir.但是,当我在/ some/path上发出dfs -ls cmd时,我可以看到它存在并且数据集文件存在(在午餐之前复制了该文件).此外,路径在hadoop配置中正确定义.任何建议将不胜感激.我正在使用hadoop 0.21.

hadoop microsoft-distributed-file-system ioexception

41
推荐指数
5
解决办法
5万
查看次数

分布式局部聚类系数算法(MapReduce/Hadoop)

我已经实现了基于MapReduce范式的局部聚类系数算法.但是,我遇到了更大的数据集或特定数据集(节点的高平均程度)的严重问题.我试图调整我的hadoop平台和代码,但结果不令人满意(至少可以说).不,我已经把注意力转向实际改变/改进算法.下面是我目前的算法(伪代码)

foreach(Node in Graph) {
  //Job1
  /* Transform edge-based input dataset to node-based dataset */

  //Job2
  map() {
   emit(this.Node, this.Node.neighbours) //emit myself data to all my neighbours
   emit(this.Node, this.Node) //emit myself to myself
  }

  reduce() {
    NodeNeighbourhood nodeNeighbourhood;
    while(values.hasNext) {
      if(myself)
        this.nodeNeighbourhood.setCentralNode(values.next) //store myself data
      else
        this.nodeNeighbourhood.addNeighbour(values.next)  //store neighbour data
    }

    emit(null, this.nodeNeighbourhood)
  }

  //Job3
  map() {
    float lcc = calculateLocalCC(this.nodeNeighbourhood)
    emit(0, lcc) //emit all lcc to specific key, combiners are used
  }

  reduce() {
    float combinedLCC;
    int …
Run Code Online (Sandbox Code Playgroud)

algorithm performance hadoop mapreduce graph

19
推荐指数
1
解决办法
1717
查看次数

PushbackInputStream:回推缓冲区已满

为什么我收到以下异常:

Exception in thread "main" java.io.IOException: Push back buffer is full
    at java.io.PushbackInputStream.unread(PushbackInputStream.java:232)
    at java.io.PushbackInputStream.unread(PushbackInputStream.java:252)
    at org.tests.io.PushBackStream_FUN.read(PushBackStream_FUN.java:32)
    at org.tests.io.PushBackStream_FUN.main(PushBackStream_FUN.java:43)
Run Code Online (Sandbox Code Playgroud)

在这段代码中:

public class PushBackStream_FUN {
    public int write(String outFile) throws Exception {
        FileOutputStream outputStream = new FileOutputStream(new File(outFile));
        String str = new String("Hello World");
        byte[] data = str.getBytes();
        outputStream.write(data);
        outputStream.close();

        return data.length;
    }

    public void read(String inFile, int ln) throws Exception {
        PushbackInputStream inputStream = new PushbackInputStream(new FileInputStream(new File(inFile)));
        byte[] data = new byte[ln];
        String str;

        // read
        inputStream.read(data);
        str = …
Run Code Online (Sandbox Code Playgroud)

java buffer stream ioexception pushbackinputstream

11
推荐指数
1
解决办法
1万
查看次数

占用最少内存的Java对象

这是一个愚蠢的问题,但在这里.

我有一个多线程程序和一个独特元素的"全局"集合.对于ConcurrentHashMap,我因性能而拒绝同步的Set实现.我真的不需要Map的Value部分,所以我想在内存使用方面使用java中最小的Object.我以不同的方式解决了这个问题(在Map中多次引用了一个布尔对象),但我仍然很好奇Java中最小的对象是什么.我一直认为它是布尔值,但我认为这不是真的(Java - 布尔基元类型 - 大小,基元数据类型)

java memory collections performance

7
推荐指数
2
解决办法
2718
查看次数

Giraph Shortest Paths示例ClassNotFoundException

我试图运行从孵化giraph(最短路径例如https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example).但是,我没有从giraph - * - dependencies.jar执行示例,而是创建了自己的作业jar.当我创建一个如示例中所示的Job文件时,我得到了

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.Test$SimpleShortestPathsVertexInputFormat
Run Code Online (Sandbox Code Playgroud)

然后我移动了内部类(SimpleShortestPathsVertexInputFormat和SimpleShortestPathsVertexOutputFormat)来分离文件并重命名它们以防万一(SimpleShortestPathsVertexInputFormat_v2,SimpleShortestPathsVertexOutputFormat_v2); 这些类不再是静态的.这解决了SimpleShortestPathsVertexInputFormat_v2找不到类的问题,但是我仍然得到SimpleShortestPathsVertexOutputFormat_v2的相同错误.下面是我的堆栈跟踪.

INFO mapred.JobClient: Running job: job_201205221101_0003
INFO mapred.JobClient:  map 0% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201205221101_0003_m_000005_0, Status : FAILED
    java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:898)
            at org.apache.giraph.graph.BspUtils.getVertexOutputFormatClass(BspUtils.java:134)
            at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
            at org.apache.hadoop.mapred.Task.initialize(Task.java:490)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
            at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
            at org.apache.hadoop.mapred.Child.main(Child.java:253)
    Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:890)
            ... 9 more
Run Code Online (Sandbox Code Playgroud)

我检查了我的工作罐,所有课程都在那里.此外,我在伪分布模式下使用hadoop 0.20.203.我的工作方式如下所示.

hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar …
Run Code Online (Sandbox Code Playgroud)

hadoop shortest-path classnotfoundexception giraph

5
推荐指数
1
解决办法
1801
查看次数

父模块的 Maven 构建 jar

我的项目有以下结构:

Project -
        |- Parent // bunch of abstract classes which are used by children
                |- child_A // depend on abstract classes from Parent
                |- child_B // depend on abstract classes from Parent
                |- child_C // depend on abstract classes from Parent
Run Code Online (Sandbox Code Playgroud)

我也想为父母和孩子制作罐子。所以我最终会得到parent.jar,child_*.jar。我怎样才能在maven中做到这一点?

module jar parent maven

3
推荐指数
1
解决办法
1700
查看次数

奇怪的全局变量行为,一旦变量名称发生变化,问题就会消失

在我的大学练习中,我遇到了变量的奇怪行为.

/* Main parameters                                                          */
double sizeX, sizeY;      /* Size of the global domain                      */
int nPartX, nPartY;       /* Particle number in x, y direction              */
int nPart;                /* Total number of particles                      */
int nCellX, nCellY;       /* (Global) number of cells in x, y direction     */
int steps;                /* Number of timesteps                            */
double dt;                /* Stepsize for timesteps                         */
int logs;                 /* Whether or not we want to keep logfiles        */

void ReadInput(const char *fname)
{
  FILE *fp; …
Run Code Online (Sandbox Code Playgroud)

c variables overwrite mpi name-conflict

1
推荐指数
1
解决办法
976
查看次数