Ray*_*ond 2 logging hadoop apache-pig
我正在尝试使用Apache Pig Latin进行一些日志处理,我想知道是否有更简单的方法来执行此操作:
filtered_logs = FOREACH logs GENERATE numDay, reqSize, optimizedSize, origSize, compressionPct, cacheStatus;
grouped_logs = GROUP filtered_logs BY numDay;
results = FOREACH grouped_logs GENERATE group,
(SUM(filtered_logs.reqSize) + SUM(filtered_logs.optimizedSize)) / 1048576.00 AS ClientThroughputMB,
(SUM(filtered_logs.reqSize) + SUM(filtered_logs.origSize)) / 1048576.00 AS ServerThroughputMB,
SUM(filtered_logs.origSize) / 1048576.00 AS OrigMB,
SUM(filtered_logs.optimizedSize) / 1048576.00 AS OptMB,
SUM(filtered_logs.reqSize) / 1048576.00 AS SentMB,
AVG(filtered_logs.compressionPct) AS CompressionAvg,
COUNT(filtered_logs) AS NumLogs;
cache_hit_logs = FILTER filtered_logs BY cacheStatus MATCHES '.*HIT.*';
grouped_cache_hit_logs = GROUP cache_hit_logs BY numDay;
cache_hits = FOREACH grouped_cache_hit_logs GENERATE group,
COUNT(cache_hit_logs) AS cnt;
final_results = JOIN results BY group, cache_hits BY group;
DUMP final_results;
Run Code Online (Sandbox Code Playgroud)
(定义了日志,它基本上是在管道分隔的日志文件中读取并分配字段)
我在这里要做的是计算字段cacheStatus包含"HIT"的实例数,并计算其他数据,如OrigMB,CompressionAvg,NumLogs等.这个当前代码有效,但它似乎有巨大的性能高架.在Pig Latin中有没有办法按照这种方式做某事(在MSSQL中)?
SUM(CASE CacheStatus WHEN 'HIT' THEN 1 else 0 END) as CacheHit
Run Code Online (Sandbox Code Playgroud)
(基本上,我不想多次处理日志,我宁愿一起完成所有这些)
对不起,如果我的问题令人困惑,我对Pig Latin很新.
没关系,我找到了自己的解决方案(愚蠢的我,忘了我可以用花括号括起来):
results = FOREACH grouped_logs
{
cache_hits = FILTER filtered_logs BY cacheStatus MATCHES '.*HIT.*';
GENERATE group,
(SUM(filtered_logs.reqSize) + SUM(filtered_logs.optimizedSize)) / 1048576.00 AS ClientThroughputMB,
(SUM(filtered_logs.reqSize) + SUM(filtered_logs.origSize)) / 1048576.00 AS ServerThroughputMB,
SUM(filtered_logs.origSize) / 1048576.00 AS OrigMB,
SUM(filtered_logs.optimizedSize) / 1048576.00 AS OptMB,
SUM(filtered_logs.reqSize) / 1048576.00 AS SentMB,
AVG(filtered_logs.compressionPct) AS CompressionAvg,
COUNT(filtered_logs) AS NumLogs,
COUNT(cache_hits) AS CacheHit;
}
Run Code Online (Sandbox Code Playgroud)