我在Pig中有这个代码(win,request和response只是直接从filesystem加载的表):
win_request = JOIN win BY bid_id, request BY bid_id;
win_request_response = JOIN win_request BY win.bid_id, response BY bid_id;
win_group = GROUP win_request_response BY (win.campaign_id);
win_count = FOREACH win_group GENERATE group, SUM(win.bid_price);
Run Code Online (Sandbox Code Playgroud)
基本上我想在加入和分组后总结bid_price,但是我收到一个错误:
Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Run Code Online (Sandbox Code Playgroud)
我的猜测是我没有正确指出win.bid_price.
我想测试Hadoop分布式文件系统(HDFS)中是否存在目录.我想创建目录,如果它不存在,否则什么都不做.
当我修改http://jugnu-life.blogspot.com/2012/10/hadoop-fs-test-example.html上的代码时:
#!/bin/bash
directory=/raw/tool/
if hadoop fs -test –d $directory ; then
echo "Directory exists"
else
hadoop fs -mkdir $directory
echo "Creating directory"
fi
Run Code Online (Sandbox Code Playgroud)
我收到错误:
-test: Too many arguments: expected 1 but got 2
Usage: hadoop fs [generic options] -test -[defsz] <path>
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
在哪里可以找到有关LoadFunc的更多信息/示例.除了http://web.archive.org/web/20130701024312/http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html, 我没有看到任何使用新LoadFunc APis的示例.任何人都可以告诉我在哪里可以找到编写Load UDF的示例?
这是我一直在关注的一些代码:
public static long getUnsignedInt(ByteBuffer buff) {
return (long) (buff.getInt() & 0xffffffffL);
}
Run Code Online (Sandbox Code Playgroud)
有没有理由这样做buff.getInt() & 0xffffffffL(0xffffffffL在32个最低有效位中有32位1)?在我看来,结果将永远是buff.getInt().