Kafka流中的RecordTooLargeException连接

Nik*_*Nik 5 java apache-kafka apache-kafka-streams

我有一个KStream x KStream连接,它正在崩溃,但有以下异常.

Exception in thread “my-clicks-and-recs-join-streams-4c903fb1-5938-4919-9c56-2c8043b86986-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_15, processor=KSTREAM-SOURCE-0000000001, topic=my_outgoing_recs_prod, partition=15, offset=9248896
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:203)
    at org.apache.kafka.streams.processor.internals.StreamThread.processAndPunctuate(StreamThread.java:679)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:557)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_15] exception caught when producing
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:136)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:87)
    at org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(StoreChangeLogger.java:59)
    at org.apache.kafka.streams.state.internals.ChangeLoggingSegmentedBytesStore.put(ChangeLoggingSegmentedBytesStore.java:59)
    at org.apache.kafka.streams.state.internals.MeteredSegmentedBytesStore.put(MeteredSegmentedBytesStore.java:105)
    at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(RocksDBWindowStore.java:107)
    at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(RocksDBWindowStore.java:100)
    at org.apache.kafka.streams.kstream.internals.KStreamJoinWindow$KStreamJoinWindowProcessor.process(KStreamJoinWindow.java:64)
    at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:47)
    at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:82)
    at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:189)
    ... 3 more
Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
Run Code Online (Sandbox Code Playgroud)

我正在加入一个Click主题的Recommendation主题.该Click对象是非常小(不到KB).Recommendation另一方面,可能很大,偶尔大于1 MB.

我用Google搜索异常,发现(这里)我需要max.request.size在生产者配置中设置.

我不明白的是,制作人在流媒体中加入了什么?上面例外topic=my_outgoing_recs_prod中的主题是建议主题,而不是最终加入的主题.流媒体应用程序是不应该只是"消费"它?

不过,我尝试将属性设置config.put("max.request.size", "31457280");为30MB.我不希望建议记录超过该限制.但是,代码崩溃了.

我无法更改Kafka集群中的配置,但如果需要,我可以更改Kafka中相关主题的属性.

有人可以建议我还能尝试什么?

如果没有任何作用,我愿意忽略这些超大消息.但是,我不知道如何处理这个问题RecordTooLargeException.

我执行连接的代码如下.

Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, JOINER_ID + "-" + System.getenv("HOSTNAME"));
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, booststrapList);
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.ByteArray().getClass().getName());
config.put("max.request.size", "314572800");
config.put("message.max.bytes", "314572800");
config.put("max.message.bytes", "314572800");


KStreamBuilder builder = new KStreamBuilder();

KStream<String, byte[]> clicksStream = builder.stream(TopologyBuilder.AutoOffsetReset.LATEST, Serdes.String(), Serdes.ByteArray(), clicksTopic);
KStream<String, byte[]> recsStream = builder.stream(TopologyBuilder.AutoOffsetReset.LATEST, Serdes.String(), Serdes.ByteArray(), recsTopic);

KStream<String, ClickRec> join = clicksStream.join(
        recsStream,
        (click, recs) -> new ClickRec(click, recs),
        JoinWindows.of(windowMillis).until(3*windowMillis));

join.to(Serdes.String(), JoinSerdes.CLICK_SERDE, jointTopic);

KafkaStreams streams = new KafkaStreams(builder, config);
streams.cleanUp();
streams.start();
Run Code Online (Sandbox Code Playgroud)

ClickRec是连接对象(它远远小于一个Recommendation对象,我不希望它大于几KB).

我在哪里放一个try...catch上面的代码来从这种偶尔超大的物体中恢复?

Mat*_*Sax 10

不同级别有多个配置:

  1. 您有一个代理设置message.max.bytes(默认为1000012)(参见http://kafka.apache.org/documentation/#brokerconfigs)
  2. 有一个主题级配置max.message.bytes(默认为1000012)(参见http://kafka.apache.org/documentation/#topicconfigs)
  3. 制片人有max.request.size(默认为1048576)(参见http://kafka.apache.org/documentation/#producerconfigs)

堆栈跟踪表明您需要更改代理或主题级别的设置:

引起:org.apache.kafka.common.errors.RecordTooLargeException:请求包含的消息大于服务器将接受的最大消息大小.

也许你还需要增加生产者设置.

你为什么首先需要这个:

当您执行KStream-KStream连接时,连接运算符会建立状态(它必须缓冲来自两个流的记录才能计算连接).默认情况下,状态由Kafka主题支持 - 本地状态基本上是缓存,而Kafka主题是事实的来源.因此,所有记录都将写入Kafka Streams自动创建的"更改日志主题".

  • 需要在集群中完成代理/主题配置...以防止异常.我建议在你写这个主题之前引入一个`flatMap()`.在这个flatMap中,你使用序列化程序将键和值转换为`byte []`类型 - 从而允许你检查大小,如果大小超过限制,你将丢弃记录,否则你返回序列化记录(即,'flatMap`的返回类型将是`<byte [],byte []>`). (4认同)