小编min*_*aut的帖子

如何强制执行A​​pache Pig上的正确数据类型?

由于数据类型错误,我无法解决一大堆值.

当我加载一个csv文件,其行如下所示:

6   574 false   10.1.72.23  2010-05-16 13:56:19 +0930   fbcdn.net   static.ak.fbcdn.net 304 text/css    1   /rsrc.php/zPTJC/hash/50l7x7eg.css   http    pwong
Run Code Online (Sandbox Code Playgroud)

使用以下内容:

logs_base = FOREACH raw_logs GENERATE
  FLATTEN(
     EXTRACT(line, '^(\\d+),"(\\d+)","(\\w+)","(\\S+)","(.+?)","(\\S+)","(\\S+)","(\\d+)","(\\S+)","(\\d+)","(\\S+)","(\\S+)","(\\S+)"')
  )
  as (
    account_id: int,
    bytes: long,
    cached: chararray,
    ip: chararray,
    time: chararray,
    domain: chararray,
    host: chararray,
    status: chararray,
    mime_type: chararray,
    page_view: chararray,
    path: chararray,
    protocol: chararray,
    username: chararray
  );
Run Code Online (Sandbox Code Playgroud)

所有字段似乎都可以正常加载,并且使用正确的类型,如"describe"命令所示:

grunt> describe logs_base
logs_base: {account_id: int,bytes: long,cached: chararray,ip: chararray,time: chararray,domain: chararray,host: chararray,status: chararray,mime_type: chararray,page_view: chararray,path: chararray,protocol: chararray,username: chararray}
Run Code Online (Sandbox Code Playgroud)

每当我执行SUM时使用:

bytesCount = FOREACH (GROUP …
Run Code Online (Sandbox Code Playgroud)

apache-pig elastic-map-reduce

4
推荐指数
1
解决办法
9770
查看次数

标签 统计

apache-pig ×1

elastic-map-reduce ×1