当我尝试运行我的Job时,我收到以下异常:
Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/path
at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:106)
at org.apache.hadoop.util.RunJar.main(RunJar.java:150)
Run Code Online (Sandbox Code Playgroud)
/ some/path的位置是hadoop.tmp.dir.但是,当我在/ some/path上发出dfs -ls cmd时,我可以看到它存在并且数据集文件存在(在午餐之前复制了该文件).此外,路径在hadoop配置中正确定义.任何建议将不胜感激.我正在使用hadoop 0.21.
我已经实现了基于MapReduce范式的局部聚类系数算法.但是,我遇到了更大的数据集或特定数据集(节点的高平均程度)的严重问题.我试图调整我的hadoop平台和代码,但结果不令人满意(至少可以说).不,我已经把注意力转向实际改变/改进算法.下面是我目前的算法(伪代码)
foreach(Node in Graph) {
//Job1
/* Transform edge-based input dataset to node-based dataset */
//Job2
map() {
emit(this.Node, this.Node.neighbours) //emit myself data to all my neighbours
emit(this.Node, this.Node) //emit myself to myself
}
reduce() {
NodeNeighbourhood nodeNeighbourhood;
while(values.hasNext) {
if(myself)
this.nodeNeighbourhood.setCentralNode(values.next) //store myself data
else
this.nodeNeighbourhood.addNeighbour(values.next) //store neighbour data
}
emit(null, this.nodeNeighbourhood)
}
//Job3
map() {
float lcc = calculateLocalCC(this.nodeNeighbourhood)
emit(0, lcc) //emit all lcc to specific key, combiners are used
}
reduce() {
float combinedLCC;
int …Run Code Online (Sandbox Code Playgroud) 为什么我收到以下异常:
Exception in thread "main" java.io.IOException: Push back buffer is full
at java.io.PushbackInputStream.unread(PushbackInputStream.java:232)
at java.io.PushbackInputStream.unread(PushbackInputStream.java:252)
at org.tests.io.PushBackStream_FUN.read(PushBackStream_FUN.java:32)
at org.tests.io.PushBackStream_FUN.main(PushBackStream_FUN.java:43)
Run Code Online (Sandbox Code Playgroud)
在这段代码中:
public class PushBackStream_FUN {
public int write(String outFile) throws Exception {
FileOutputStream outputStream = new FileOutputStream(new File(outFile));
String str = new String("Hello World");
byte[] data = str.getBytes();
outputStream.write(data);
outputStream.close();
return data.length;
}
public void read(String inFile, int ln) throws Exception {
PushbackInputStream inputStream = new PushbackInputStream(new FileInputStream(new File(inFile)));
byte[] data = new byte[ln];
String str;
// read
inputStream.read(data);
str = …Run Code Online (Sandbox Code Playgroud) 这是一个愚蠢的问题,但在这里.
我有一个多线程程序和一个独特元素的"全局"集合.对于ConcurrentHashMap,我因性能而拒绝同步的Set实现.我真的不需要Map的Value部分,所以我想在内存使用方面使用java中最小的Object.我以不同的方式解决了这个问题(在Map中多次引用了一个布尔对象),但我仍然很好奇Java中最小的对象是什么.我一直认为它是布尔值,但我认为这不是真的(Java - 布尔基元类型 - 大小,基元数据类型)
我试图运行从孵化giraph(最短路径例如https://cwiki.apache.org/confluence/display/GIRAPH/Shortest+Paths+Example).但是,我没有从giraph - * - dependencies.jar执行示例,而是创建了自己的作业jar.当我创建一个如示例中所示的Job文件时,我得到了
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.Test$SimpleShortestPathsVertexInputFormat
Run Code Online (Sandbox Code Playgroud)
然后我移动了内部类(SimpleShortestPathsVertexInputFormat和SimpleShortestPathsVertexOutputFormat)来分离文件并重命名它们以防万一(SimpleShortestPathsVertexInputFormat_v2,SimpleShortestPathsVertexOutputFormat_v2); 这些类不再是静态的.这解决了SimpleShortestPathsVertexInputFormat_v2找不到类的问题,但是我仍然得到SimpleShortestPathsVertexOutputFormat_v2的相同错误.下面是我的堆栈跟踪.
INFO mapred.JobClient: Running job: job_201205221101_0003
INFO mapred.JobClient: map 0% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201205221101_0003_m_000005_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:898)
at org.apache.giraph.graph.BspUtils.getVertexOutputFormatClass(BspUtils.java:134)
at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
at org.apache.hadoop.mapred.Task.initialize(Task.java:490)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:890)
... 9 more
Run Code Online (Sandbox Code Playgroud)
我检查了我的工作罐,所有课程都在那里.此外,我在伪分布模式下使用hadoop 0.20.203.我的工作方式如下所示.
hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar …Run Code Online (Sandbox Code Playgroud) 我的项目有以下结构:
Project -
|- Parent // bunch of abstract classes which are used by children
|- child_A // depend on abstract classes from Parent
|- child_B // depend on abstract classes from Parent
|- child_C // depend on abstract classes from Parent
Run Code Online (Sandbox Code Playgroud)
我也想为父母和孩子制作罐子。所以我最终会得到parent.jar,child_*.jar。我怎样才能在maven中做到这一点?
在我的大学练习中,我遇到了变量的奇怪行为.
/* Main parameters */
double sizeX, sizeY; /* Size of the global domain */
int nPartX, nPartY; /* Particle number in x, y direction */
int nPart; /* Total number of particles */
int nCellX, nCellY; /* (Global) number of cells in x, y direction */
int steps; /* Number of timesteps */
double dt; /* Stepsize for timesteps */
int logs; /* Whether or not we want to keep logfiles */
void ReadInput(const char *fname)
{
FILE *fp; …Run Code Online (Sandbox Code Playgroud)