我在EC2实例上有一个设置,它使用Whirr来启动新的hadoop实例.我一直试图让Hive使用这个设置.应将Hive配置为使用mysql作为本地Metastore.我遇到的问题是每次我尝试通过hive接口运行像(CREATE TABLE测试器(foo INT,bark STRING);)这样的查询时,它只是挂在那里,似乎没有做任何事情.
任何帮助,将不胜感激.
我想对数据库(MS SQL Server)中的数据进行分析.那么我怎样才能在Sqoop/Hive的帮助下将这些数据带到HDFS上?是否可以使用Hive/Sqoop?请建议我怎么做.
谢谢.
我是Unix Shell Scripting世界的新手.我想从unix shell脚本运行一个简单的sql查询,并将结果输出到.txt文件中,然后将该.txt文件作为附件发送到电子邮件中.
SQL查询并将输出传递给txt文件:
SELECT count(*) from pds_table > a.txt;
Run Code Online (Sandbox Code Playgroud)
如何从shell脚本执行此操作并将输出发送到txt文件,然后将该txt文件作为电子邮件中的附件发送.
Hive为java Map Reduce作业提供了一个抽象层,因此与Java Map Reduce作业相比,它应该具有性能问题.
Do we have any benchmark to compare the performance of Hive Query & Java Map Reduce Jobs ? 
Run Code Online (Sandbox Code Playgroud)
具有运行时数据的实际用例场景将是真正的帮助.
谢谢
我正在将一个表从mysql导入到hive.该表有2115584行.在导入过程中我看到了
13/03/20 18:34:31 INFO mapreduce.ImportJobBase: Retrieved 2115584 records.
Run Code Online (Sandbox Code Playgroud)
但是当我count(*)在导入的表上执行a 时,我看到它有49262250行.到底是怎么回事?
更新:--direct指定时导入正常工作.
我不清楚hive中分区和分区之间的区别,如果你可以通过示例提供一些细节,我将非常感激.
我有一个有四列的表.
C1    C2    C3    C4
--------------------
x1    y1    z1    d1
x2    y2    z2    d2
Run Code Online (Sandbox Code Playgroud)
现在我想将它转换为具有键和值对的地图数据类型并加载到单独的表中.
create table test
(
   level map<string,string>
)
row format delimited
COLLECTION ITEMS TERMINATED BY '&'
map keys terminated by '=';
Run Code Online (Sandbox Code Playgroud)
现在我在sql下面使用加载数据.
insert overwrite table test
select str_to_map(concat('level1=',c1,'&','level2=',c2,'&','level3=',c3,'&','level4=',c4) from input;
Run Code Online (Sandbox Code Playgroud)
在表格上选择查询.
select * from test;
{"level1":"x1","level2":"y1","level3":"z1","level4":"d1=\\"}
{"level1":"x2","level2":"y2","level3":"z2","level4":"d2=\\"}
Run Code Online (Sandbox Code Playgroud)
我没理解为什么我在最后一个值中得到额外的"=\\".
我仔细检查数据,但问题仍然存在.
你能帮忙吗?
我正在尝试在Hive中运行此查询,以仅返回在adimpression表中更常出现的前10个网址.
select
        ranked_mytable.url,
        ranked_mytable.cnt
from
        ( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
        from
                ( select url, count(*) cnt
                from store.adimpression ai
                        inner join zuppa.adgroupcreativesubscription agcs
                                on agcs.id = ai.adgroupcreativesubscriptionid
                        inner join zuppa.adgroup ag
                                on ag.id = agcs.adgroupid
                where ai.datehour >= '2014-05-15 00:00:00'
                        and ag.siteid = 1240
                group by url
                ) iq
        ) ranked_mytable
where
      ranked_mytable.rnk <= 10
order by
        ranked_mytable.url,
        ranked_mytable.rnk desc
;
Run Code Online (Sandbox Code Playgroud)
不幸的是我收到一条错误消息:
FAILED: SemanticException [Error 10002]: Line 26:23 Invalid column reference …Run Code Online (Sandbox Code Playgroud) 我有一个hive表A,其中包含以下列
USER   ITEM    SCORE
U1      I1       S1
U1      I2       S2
...................
Run Code Online (Sandbox Code Playgroud)
我想要的是表格B这样的格式
USER    ITEMS    #ITEMS is an array
 U1     [I2,I3,...]   # items are sorted according to score in descending and limit 5
Run Code Online (Sandbox Code Playgroud)
对于用户少于5个项目,只需将项目按降序排列.
我编写了MR脚本,它应该从HBase加载数据并将它们转储到Hive中.连接到HBase是可以的,但是当我尝试将数据保存到HIVE表时,我收到以下错误消息:
 Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, org.apache.hive.hcatalog.common.HCatException : 2004 : HCatOutputFormat not initialized, setOutput has to be called
  org.apache.oozie.action.hadoop.JavaMainException: org.apache.hive.hcatalog.common.HCatException : 2004 : HCatOutputFormat not initialized, setOutput has to be called
  at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
  at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
  at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:36)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
  Caused by: org.apache.hive.hcatalog.common.HCatException : 2004 : HCatOutputFormat not initialized, setOutput has …Run Code Online (Sandbox Code Playgroud)