我需要在一天内每小时收到卡夫卡的消息.每隔一小时我就会开始一份工作来消费1小时前制作的消息.例如,如果当前时间是20:12,我将在19:00:00和19:59:59之间消费该消息.这意味着我需要在时间19:00:00获得开始偏移,并在时间19:59:59之前结束偏移.我使用了SimpleConsumer.getOffsetsBefore,如" 0.8.0 SimpleConsumer Example "中所示.问题是返回的偏移量与作为参数给出的时间戳不匹配.例如,当时间戳为19:00:00时,我收到时间16:38:00产生的消息.
我想将服务部署到2台服务器.我在一台服务器上成功,但在另一台服务器上失败了.事实上,我尽力使他们的环境相同.错误日志如下:
2013-01-21 22:08:18.178:WARN:oejuc.AbstractLifeCycle:FAILED jsp: java.lang.NoSuchFieldError: IS_SECURITY_ENABLED
java.lang.NoSuchFieldError: IS_SECURITY_ENABLED
at org.apache.jasper.compiler.JspRuntimeContext.<init>(JspRuntimeContext.java:197)
at org.apache.jasper.servlet.JspServlet.init(JspServlet.java:150)
at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:492)
at org.eclipse.jetty.servlet.ServletHolder.doStart(ServletHolder.java:312)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:776)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1213)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:58)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
at org.eclipse.jetty.server.Server.doStart(Server.java:263)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
at …Run Code Online (Sandbox Code Playgroud) Hive版本:1.2.1
组态:
set hive.execution.engine=tez;
set hive.merge.mapredfiles=true;
set hive.merge.smallfiles.avgsize=256000000;
set hive.merge.tezfiles=true;
Run Code Online (Sandbox Code Playgroud)
HQL:
ALTER TABLE `table_name` PARTITION (partion_name1 = 'val1', partion_name2='val2', partion_name3='val3', partion_name4='val4') CONCATENATE;
Run Code Online (Sandbox Code Playgroud)
我使用HQL来合并特定表/分区的文件.但是,执行后输出目录中仍有很多文件; 而且它们的大小远远小于256000000.那么如何减少输出文件的数量.
顺便说一句,使用MapReduce而不是Tez也没有用.