小编Sij*_*eph的帖子

失败，发生异常java.io.IOException：org.apache.avro.AvroTypeException：发现的时间很长，期望在配置单元中进行合并

需要帮忙！！！

我正在使用flume将Twitter提要流式传输到hdfs中，并将其加载hive进行分析。

步骤如下：

我已经avro schema在avsc文件中描述了并将其放入hadoop：

 {"type":"record",
 "name":"Doc",
 "doc":"adoc",
 "fields":[{"name":"id","type":"string"},
       {"name":"user_friends_count","type":["int","null"]},
       {"name":"user_location","type":["string","null"]},
       {"name":"user_description","type":["string","null"]},
       {"name":"user_statuses_count","type":["int","null"]},
       {"name":"user_followers_count","type":["int","null"]},
       {"name":"user_name","type":["string","null"]},
       {"name":"user_screen_name","type":["string","null"]},
       {"name":"created_at","type":["string","null"]},
       {"name":"text","type":["string","null"]},
       {"name":"retweet_count","type":["boolean","null"]},
       {"name":"retweeted","type":["boolean","null"]},
       {"name":"in_reply_to_user_id","type":["long","null"]},
       {"name":"source","type":["string","null"]},
       {"name":"in_reply_to_status_id","type":["long","null"]},
       {"name":"media_url_https","type":["string","null"]},
       {"name":"expanded_url","type":["string","null"]}]}

Run Code Online (Sandbox Code Playgroud)

我写了一个.hql文件来创建表并在其中加载数据：

 create table tweetsavro
    row format serde
        'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    stored as inputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    outputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');

    load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro;

Run Code Online (Sandbox Code Playgroud)

我已经成功运行.hql文件，但是当我select *from <tablename>在蜂巢中运行命令时，它显示以下错误：

错误

tweetsavro的输出为：

hive> desc tweetsavro;
OK
id                      string                                      
user_friends_count      int                                         
user_location           string                                      
user_description        string                                      
user_statuses_count     int …

Run Code Online (Sandbox Code Playgroud)

java hadoop hive

Sij*_*eph

2016 02-19

5
推荐指数

1
解决办法

1万
查看次数

标签统计

hadoop ×1

hive ×1

java ×1

失败，发生异常java.io.IOException：org.apache.avro.AvroTypeException：发现的时间很长，期望在配置单元中进行合并

标签 统计

小编Sij_eph的帖子

标签统计