使用字符串数组在Hive表上加载CSV文件

Dee*_*tty 5 csv hadoop hive

我正在尝试将CS​​V文件插入到Hive中,其中一个字段是字符串数组.

这是CSV文件:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,[Health&Fitness,Travel]
99,Snacks that Power Up Weight Loss,Aidan B. Prince,[Photo,Travel]
Run Code Online (Sandbox Code Playgroud)

我尝试创建这样的表:

CREATE TABLE IF NOT EXISTS Article
(
ARTICLE_ID INT,
ARTICLE_NSAME STRING,
ARTICLE_AUTHOR STRING,
ARTICLE_GENRE ARRAY<STRING>
);
LOAD DATA INPATH '/tmp/pinterest/article.csv' OVERWRITE INTO TABLE Article;
select * from Article;  
Run Code Online (Sandbox Code Playgroud)

这是我得到的输出:

article.article_id  article.article_name    article.article_author  article.article_genre
48  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Health&Fitness"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Photo"]
Run Code Online (Sandbox Code Playgroud)

它在最后一个字段article_genre中只占一个值.

谁能指出这里有什么问题?

Cha*_*ant 10

几个东西:
你缺少收集项目分隔符的定义.
另外,我假设您希望you select * from article语句返回如下:

48  Snacks that Power Up Weight Loss    Aidan B. Prince ["Health&Fitness","Travel"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["Photo","Travel"]
Run Code Online (Sandbox Code Playgroud)

我可以给你一个例子,休息你可以摆弄它.这是我的表定义:

create table article (
  id int,
  name string,
  author string,
  genre array<string>
)
row format delimited
fields terminated by ','
collection items terminated by '|';
Run Code Online (Sandbox Code Playgroud)

这是数据:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,Health&Fitness|Travel
99,Snacks that Power Up Weight Loss,Aidan B. Prince,Photo|Travel
Run Code Online (Sandbox Code Playgroud)

现在执行类似的加载:
LOAD DATA local INPATH '/path' OVERWRITE INTO TABLE article; 并执行select语句来检查结果.

最重要的一点 :
为集合项定义分隔符,不要强加你在正常编程中执行的数组结构.
此外,尝试使字段分隔符与集合项分隔符不同,以避免混淆和意外结果.