有时我的MR工作抱怨没有找到MyMapper类.而且我必须给job.setJarByClass(MyMapper.class); 告诉它从我的jar文件中加载它.
cloudera @cloudera-vm:/ tmp/translator $ hadoop jar MapReduceJobs.jar translator/input/Portuguese.txt translator/output 13/06/13 03:36:57 WARN mapred.JobClient:没有作业jar文件集.可能找不到用户类.请参阅JobConf(Class)或JobConf#setJar(String).13/06/13 03:36:57 INFO input.FileInputFormat:要处理的总输入路径:1 13/06/13 03:36:57 INFO mapred.JobClient:正在运行的工作:job_201305100422_0043 13/06/13 03:36: 58 INFO mapred.JobClient:map 0%reduce 0%13/06/13 03:37:03 INFO mapred.JobClient:Task Id:attempt_201305100422_0043_m_000000_0,Status:FAILED java.lang.RuntimeException:java.lang.ClassNotFoundException:com.mapreduce .variousformats.keyvaluetextinputformat.MyMapper at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:
问题:为什么会这样.为什么不总是告诉我从我的jar文件中加载它.是否有一些解决这类问题的最佳实践.此外,如果我使用一些第三方库,我也必须为他们这样做.
首先请原谅我可能是一个非常幼稚的问题.我的任务是为我的项目确定正确的nosql数据库.我以高度并发的方式插入和更新表(列族)中的记录.
然后我遇到了这个.
INFO 11:55:20,924 Writing Memtable-scan_request@314832703(496750/1048576 serialized/live bytes, 8204 ops)
INFO 11:55:21,084 Completed flushing /var/lib/cassandra/data/mykey/scan_request/mykey-scan_request-ic-14-Data.db (115527 bytes) for commitlog position ReplayPosition(segmentId=1372313109304, position=24665321)
INFO 11:55:21,085 Writing Memtable-scan_request@721424982(1300975/2097152 serialized/live bytes, 21494 ops)
INFO 11:55:21,191 Completed flushing /var/lib/cassandra/data/mykey/scan_request/mykey-scan_request-ic-15-Data.db (304269 bytes) for commitlog position ReplayPosition(segmentId=1372313109304, position=26554523)
WARN 11:55:21,268 Heap is 0.829968311377531 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want …Run Code Online (Sandbox Code Playgroud) 运行映射器的节点处理如何知道它必须将一些键值输出发送到节点A(运行reducer),一些键值输出到节点B(运行另一个reducer)?是否存在由JobTracker维护的reducer节点列表?如果是,它如何选择节点来运行减速器?
我正在尝试读取 rtf 文件并提取其中的字符。\n例如 下面是 \xd1\x84 的 rtf 版本
\n\n\n\n\n{\\rtf1\\ansi\\ansicpg1252\\fromtext \\fbidis \\deff0{\\fonttbl\n {\\f0\\fswiss\\fcharset0 Arial;} {\\f1\\fmodern Courier New;}\ n {\\f2\\fnil\\fcharset2 Symbol;} {\\f3\\fmodern\\fcharset0 Courier New;}\n {\\f4\\fswiss\\fcharset204 Arial;}}\n {\\colortbl \\red0\\green0\\blue0;\\red0\\green0\\blue255;}\n \\uc1\\pard\\plain\\deftab360 \\f0\\fs20 \\htmlrtf{\\f4\\ fs20\\htmlrtf0\n \\\'f4\\htmlrtf\\f0}\\htmlrtf0 \\par }
\n
正如你所看到的,这里的编码是 Windows-1252
\n\n#!/usr/bin/perl\nuse strict;\nuse utf8;\nuse Encode qw(decode encode);\n\nbinmode(STDOUT, ":utf8");\nmy $runtime = chr(0x0444);\n print "theta || ".$runtime." ||";\n\n my $hexstr = "0xF4";\n my $num = hex $hexstr;\n my $be_num = pack("N", $num);\n $runtime = decode( "cp1252",$be_num);\n print "\\n".$runtime."\\n";\n\n$runtime = decode( "cp1251",$be_num);\n print "\\n".$runtime."\\n"\n …Run Code Online (Sandbox Code Playgroud) 问题1:
我正在阅读Java中的Hard-core多线程,并且确实进入了下面的信号量示例.
package com.dswgroup.conferences.borcon.threading;
public class ResourceGovernor {
private int count;
private int max;
public ResourceGovernor(int max) {
count = 0;
this.max = max;
}
public synchronized void getResource(int numberof) {
while (true) {
if ((count + numberof) <= max) {
count += numberof;
break;
}
try {
wait();
} catch (Exception ignored) {}
}
}
public synchronized void freeResource(int numberof) {
count -= numberof;
notifyAll();
}
}
Run Code Online (Sandbox Code Playgroud)
我觉得这可能导致以下情况陷入僵局:
正在使用所有资源,并且新线程会询问不可用的资源.由于它在synchronized函数内部等待,因此使用资源的其他线程无法释放资源,因为freeResource函数也被同步,并且由于等待线程已经取得对象级别锁定,它们无法进入该函数ResourceGovernor
还有另一个问题是,如果某个线程试图释放更多no,则尚未验证.资源比它获得的资源.但是这个问题是次要的,可以通过使用线程名称和资源计数的同步映射来轻松修复.
但我能否安全地说我正确诊断出第一个问题.(需要在embarcadero.com上发布很长一段时间后再次检查)
问题2:
我可以安全地说只有1个资源的信号量与互斥锁具有相同的行为吗?
我在ubuntu14.04上运行.我能够在机器上安装datastax的cpp驱动程序.
但我无法运行任何示例
Install the project...
-- Install configuration: ""
-- Up-to-date: /usr/local/include/cassandra.h
-- Up-to-date: /usr/local/lib/libcassandra.so.0.7.0
-- Up-to-date: /usr/local/lib/libcassandra.so.0
-- Up-to-date: /usr/local/lib/libcassandra.so
-- Up-to-date: /usr/local/lib/libcassandra_static.a
root@ubuntu-cassandra:~/cpp-driver# cd -
/root/cpp-driver/examples/simple
root@ubuntu-cassandra:~/cpp-driver/examples/simple# strace -s 1024 -f -e execve gcc -I /usr/local/include/ -L /usr/local/lib/ -lcassandra simple.c
execve("/usr/bin/gcc", ["gcc", "-I", "/usr/local/include/", "-L", "/usr/local/lib/", "-lcassandra", "simple.c"], [/* 23 vars */]) = 0
Process 30450 attached
[pid 30450] execve("/usr/lib/gcc/x86_64-linux-gnu/4.8/cc1", ["/usr/lib/gcc/x86_64-linux-gnu/4.8/cc1", "-quiet", "-I", "/usr/local/include/", "-imultiarch", "x86_64-linux-gnu", "simple.c", "-quiet", "-dumpbase", "simple.c", "-mtune=generic", "-march=x86-64", "-auxbase", "simple", "-fstack-protector", "-Wformat", "-Wformat-security", "-o", …Run Code Online (Sandbox Code Playgroud)