And*_*lho 16 hadoop apache-pig
我正在使用Hadoop 2.0.5运行Apache Pig .11.1.
我在Pig中运行的大多数简单工作都非常好.
但是,每当我尝试在大型数据集或LIMIT运算符上使用GROUP BY时,我都会收到以下连接错误:
2013-07-29 13:24:08,591 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
013-07-29 11:57:29,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:30,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:31,422 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException
Run Code Online (Sandbox Code Playgroud)
奇怪的是,在这些错误出现约2分钟后,它们会停止,正确的输出显示在底部.
所以Hadoop运行正常并计算出正确的输出.问题是这些连接错误不断出现.
该LIMIT
运营商总是得到这个错误.它发生在MapReduce模式和本地模式上.该GROUP BY
运营商将正常工作的小数据集.
我注意到的一件事是,无论何时出现此错误,作业都会在作业期间创建并运行多个JAR文件.但是,在弹出这些消息几分钟后,最终会出现正确的输出.
有关如何摆脱这些消息的任何建议?
And*_*lho 31
是的问题是作业历史记录服务器没有运行.
我们要解决此问题所需要做的就是在命令提示符中输入以下命令:
mr-jobhistory-daemon.sh start historyserver
Run Code Online (Sandbox Code Playgroud)
此命令将启动作业历史记录服务器.现在,如果我们输入'jps',我们可以看到JobHistoryServer正在运行,我的Pig作业不再浪费时间尝试连接到服务器.