Apache Pig中的连接错误

And*_*lho 16 hadoop apache-pig

我正在使用Hadoop 2.0.5运行Apache Pig .11.1.

我在Pig中运行的大多数简单工作都非常好.

但是,每当我尝试在大型数据集或LIMIT运算符上使用GROUP BY时,我都会收到以下连接错误:

2013-07-29 13:24:08,591 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
013-07-29 11:57:29,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:30,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:31,422 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException
Run Code Online (Sandbox Code Playgroud)

奇怪的是,在这些错误出现约2分钟后,它们会停止,正确的输出显示在底部.

所以Hadoop运行正常并计算出正确的输出.问题是这些连接错误不断出现.

LIMIT运营商总是得到这个错误.它发生在MapReduce模式和本地模式上.该GROUP BY运营商将正常工作的小数据集.

我注意到的一件事是,无论何时出现此错误,作业都会在作业期间创建并运行多个JAR文件.但是,在弹出这些消息几分钟后,最终会出现正确的输出.

有关如何摆脱这些消息的任何建议?

And*_*lho 31

是的问题是作业历史记录服务器没有运行.

我们要解决此问题所需要做的就是在命令提示符中输入以下命令:

mr-jobhistory-daemon.sh start historyserver
Run Code Online (Sandbox Code Playgroud)

此命令将启动作业历史记录服务器.现在,如果我们输入'jps',我们可以看到JobHistoryServer正在运行,我的Pig作业不再浪费时间尝试连接到服务器.