编码器问题Apache Beam和CombineFn

Chr*_*kos 8 java google-cloud-platform google-cloud-dataflow apache-beam

我们正在使用Apache Beam和DirectRunner作为跑步者构建管道.我们目前正在尝试一个简单的管道,我们:

  1. 从Google Cloud Pub/Sub中提取数据(目前使用模拟器在本地运行)
  2. 反序列化为Java对象
  3. 使用固定窗口1分钟的窗口事件
  4. 使用自定义CombineFn将这些窗口从事件转换为事件列表来组合这些窗口.

管道代码:

pipeline
.apply(PubsubIO.<String>read().topic(options.getTopic()).withCoder(StringUtf8Coder.of()))

.apply("ParseEvent", ParDo.of(new ParseEventFn()))

.apply("WindowOneMinute",Window.<Event>into(FixedWindows.of(Duration.standardMinutes(1))))              

.apply("CombineEvents", Combine.globally(new CombineEventsFn()));
Run Code Online (Sandbox Code Playgroud)

ParseEvent函数:

    static class ParseEventFn extends DoFn<String, Event> {
        @ProcessElement
        public void processElement(ProcessContext c) {
            String json = c.element();
            c.output(gson.fromJson(json, Event.class));
        }
    }
Run Code Online (Sandbox Code Playgroud)

CombineEvents功能:

public static class CombineEventsFn extends CombineFn<Event, CombineEventsFn.Accum, EventListWrapper> {
        public static class Accum {
            EventListWrapper eventListWrapper = new EventListWrapper();
        }

        @Override
        public Accum createAccumulator() {
            return new Accum();
        }

        @Override
        public Accum addInput(Accum accumulator, Event event) {
            accumulator.eventListWrapper.events.add(event);
            return accumulator;
        }

        @Override
        public Accum mergeAccumulators(Iterable<Accum> accumulators) {
            Accum merged = createAccumulator();
            for (Accum accum : accumulators) {
                merged.eventListWrapper.events.addAll(accum.eventListWrapper.events);
            }
            return merged;
        }

        @Override
        public EventListWrapper extractOutput(Accum accumulator) {
            return accumulator.eventListWrapper;
        }

    }
Run Code Online (Sandbox Code Playgroud)

尝试使用Maven在本地运行时DirectRunner,我们收到以下错误:

java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Unable to return a default Coder for CombineEvents/Combine.perKey(CombineEvents)/Combine.GroupedValues/ParDo(Anonymous).out [PCollection]. Correct one of the following root causes:
  No Coder has been manually specified;  you may do so using .setCoder().
  Inferring a Coder from the CoderRegistry failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
  Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
  Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
  Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
  Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder$2@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder$1@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
  Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
  Using the default output Coder from the producing PTransform failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
  Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
  Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
  Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
  Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder$2@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder$1@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
  Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
    at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
    at org.apache.beam.sdk.values.TypedPValue.getCoder(TypedPValue.java:51)
    at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:130)
    at org.apache.beam.sdk.values.TypedPValue.finishSpecifying(TypedPValue.java:90)
    at org.apache.beam.sdk.runners.TransformHierarchy.finishSpecifyingInput(TransformHierarchy.java:143)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:418)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:154)
    at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1459)
    at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1336)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:350)
    at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:167)
at ***************************.main(***************.java:231)
... 6 more
Run Code Online (Sandbox Code Playgroud)

为巨大的代码转储道歉 - 想要提供所有上下文.

我很好奇,为什么它在抱怨两个没有默认的编码器java.lang.Objectorg.apache.beam.sdk.values.KV<K, OutputT>-据我可以告诉我们的管道之间改变类型String,Event以及EventListWrapper-后者两班对类本身设置的默认编码器(AvroCoder在这两种情况下).

错误发生在我们应用CombineFn的行上 - 可以确认没有这个转换,管道工作.

我怀疑我们在某种程度上错误地设置了联合变换,但是迄今为止还没有在Beam文档中找到任何指示我们正确方向的东西.

任何见解将不胜感激 - 提前感谢!

Ken*_*les 7

你看到的可能原因java.lang.Object是因为Beam试图推断出一个未解析的类型变量的编码器,它将被解析为Object.这可能是编码器推断在内部进行的错误Combine.

另外,我希望Accum课程也会导致编码器推断失败.你可以getAccumulatorCoder在你的覆盖中CombineFn直接提供一个.


小智 5

您是否检查过将 Serializable 添加到您的累加器是否可以直接工作?

因此,将“实现可序列化”添加到 Accum 类...

public static class Accum implements Serializable {
            EventListWrapper eventListWrapper = new EventListWrapper();
        }
Run Code Online (Sandbox Code Playgroud)