有没有办法将重复项保存在Hive中的收集集中,或者模拟Hive使用其他方法提供的聚合集合的类型?我想将列中具有相同键的所有项聚合到一个数组中,并重复.
IE:
hash_id | num_of_cats
=====================
ad3jkfk 4
ad3jkfk 4
ad3jkfk 2
fkjh43f 1
fkjh43f 8
fkjh43f 8
rjkhd93 7
rjkhd93 4
rjkhd93 7
Run Code Online (Sandbox Code Playgroud)
应该返回:
hash_agg | cats_aggregate
===========================
ad3jkfk Array<int>(4,4,2)
fkjh43f Array<int>(1,8,8)
rjkhd93 Array<int>(7,4,7)
Run Code Online (Sandbox Code Playgroud) 如何在Hive中进行子选择?我想我可能会犯一个非常明显的错误,这对我来说并不那么明显......
我收到的错误: FAILED: Parse Error: line 4:8 cannot recognize input 'SELECT' in expression specification
这是我的三个源表:
aaa_hit -> [SESSION_KEY, HIT_KEY, URL]
aaa_event-> [SESSION_KEY,HIT_KEY,EVENT_ID]
aaa_session->[SESSION_KEY,REMOTE_ADDRESS]
Run Code Online (Sandbox Code Playgroud)
...而我想要做的是将结果插入到结果表中,如下所示:
result -> [url, num_url, event_id, num_event_id, remote_address, num_remote_address]
Run Code Online (Sandbox Code Playgroud)
...其中第1列是URL,第3列是每个URL的前1个"事件",第5列是访问该URL的前1个REMOTE_ADDRESS.(甚至列是前一列的"计数".)
Soooooo ......我在这里做错了什么?
INSERT OVERWRITE TABLE result2
SELECT url,
COUNT(url) AS access_url,
(SELECT events.event_id as evt,
COUNT(events.event_id) as access_evt
FROM aaa_event events
LEFT OUTER JOIN aaa_hit hits
ON ( events.hit_key = hit_key )
ORDER BY access_evt DESC LIMIT 1),
(SELECT sessions.remote_address as remote_address,
COUNT(sessions.remote_address) as access_addr
FROM aaa_session …Run Code Online (Sandbox Code Playgroud) 我正在尝试在Mac OS X Lion上从Erlang(http://www.erlang.org/doc/man/erl_nif.html)编译NIF测试.我无法编译.我错过了编译器标志吗?这是我得到的错误:
Computer:~ me $ gcc -fPIC -shared -o niftest.so niftest.c -I /usr/local/Cellar/erlang/R14B02/lib/erlang/usr/include/
Undefined symbols for architecture x86_64:
"_enif_make_string", referenced from:
_hello in ccXfh0oG.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status
Run Code Online (Sandbox Code Playgroud)
我也试过这个,-m32但它说没有i386架构.
谢谢!
我可以在插入现有字段时对Hive表进行分区吗?
我有一个10 GB的文件,其中包含日期字段和一小时的字段.我可以将此文件加载到表中,然后插入覆盖到另一个使用这些字段作为分区的分区表吗?会有类似下面的工作吗?
INSERT OVERWRITE TABLE tealeaf_event PARTITION(dt=evt.datestring,hour=evt.hour)
SELECT * FROM staging_event evt;
Run Code Online (Sandbox Code Playgroud)
谢谢!
特拉维斯
我有一个外部 exe 程序,它从标准输入读取并生成结果。它的工作方式wc与程序类似,一直读取到 EOF。(或者更确切地说,流结束。)
更新:让我再添加一条解释:我基本上是在尝试编写一个 Erlang 管道。
我可以在批处理文件中调用该程序,echo 339371249625 | LookupProj.exe但我希望能够从 Erlang 向其传递数据gen_server。
我研究过 Erlang Ports,但我很难让他们发挥得很好。这是我所拥有的:
test(InputText) ->
P = open_port({spawn, "/ExternEvent/LookupProj.exe"}, [stream, exit_status, use_stdio,
stderr_to_stdout, in, out]),
IBin = list_to_binary(InputText),
%% io:format("~p~n",[I2]),
P ! {self(), {command, <<IBin/binary, <<26>>/binary>>}}, %% ASCII 26 = EOF
P ! {self(), {eof}}, %% ERROR -- how to close stdin of the cat process?
receive B -> io:format("~p",[B]) end.
Run Code Online (Sandbox Code Playgroud)
我尝试使用该eof标志open_port没有任何帮助。(不确定这是否是正确的标志?)
我哪里做错了?谢谢!
好吧,这个超出了我:我必须度过漫长的一天.为什么(13!mod 10)出现为4,当数字以两个0结束?
试试这个:
<?php $thirteen_fac = 6227020800;
echo $thirteen_fac % 10; ?>
Run Code Online (Sandbox Code Playgroud)
结果是4.预期为0.
我必须忘记一些非常明显的事情......
我已经看到如何在TOS中使用tMap来映射类似SQL的JOIN中的不同字段.如何根据某些字段进行汇总?
如果我有两张桌子:
[ A, B, C, D ]
and that are tMap'ped to [ B, C, F, G ]
[ B, E, F, G]
Run Code Online (Sandbox Code Playgroud)
如何将结果聚合到那个而不是非唯一BI的许多条目可以看到如下内容:
[ B, count(B), avg(C), avg(F), avg(G) ]
Run Code Online (Sandbox Code Playgroud)
谢谢!
有谁擅长调试Erlang?我无法弄清楚我的生活有什么不对.无论我把Fields变量放在哪里,Erlang都说在那行之前有一个错误......
编译消息:
./eventbus.erl:6: syntax error before: FieldPositions
./eventbus.erl:24: variable 'FieldPositions' is unbound
./eventbus.erl:28: Warning: variable 'Ref' is unused
./eventbus.erl:30: Warning: variable 'List' is unused
error
Run Code Online (Sandbox Code Playgroud)
然后是代码本身.
-module(secret).
-export([listen/1, send/1]).
-define(TCP_OPTIONS, [binary, {packet, 0}, {active, false}, {reuseaddr, true}]).
FieldPositions = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","AA","BB","CC","DD","EE","FF","GG","HH","II","JJ","KK","LL","MM","NN","OO","PP"].
listen(Port) ->
{ok, LSocket} = gen_tcp:listen(Port, ?TCP_OPTIONS),
accept(LSocket).
accept(LSocket) ->
{ok, Socket} = gen_tcp:accept(LSocket),
spawn(fun() -> loop(Socket) end),
accept(LSocket).
discrim(<<>>) ->
ok;
discrim([]) ->
ok;
discrim(Info) ->
EventsList = string:tokens(Info,"|"),
process_events(EventsList, FieldPositions).
process_events([],[]) ->
ok;
process_events([],Ref) ->
ok;
process_events(List,[]) -> …Run Code Online (Sandbox Code Playgroud)