Apache Beam 不支持 Kotlin Iterable?

mar*_*seu 10 kotlin google-cloud-dataflow apache-beam

Apache Beam 似乎拒绝识别 Kotlin 的Iterable. 这是一个示例代码:

@ProcessElement
fun processElement(
    @Element input: KV<String, Iterable<String>>, receiver: OutputReceiver<String>
) {
    val output = input.key + "|" + input.value.toString()
    println("output: $output")
    receiver.output(output)
}
Run Code Online (Sandbox Code Playgroud)

我收到以下奇怪的错误:

java.lang.IllegalArgumentException:
   ...PrintString, @ProcessElement processElement(KV, OutputReceiver), @ProcessElement processElement(KV, OutputReceiver):
   @Element argument must have type org.apache.beam.sdk.values.KV<java.lang.String, java.lang.Iterable<? extends java.lang.String>>
Run Code Online (Sandbox Code Playgroud)

果然,如果我替换Iterablejava.lang.Iterable,相同的代码就可以正常工作。我究竟做错了什么?

依赖版本:

  • 科特林-jvm: 1.3.21
  • org.apache.beam: 2.11.0

这是一个包含完整代码和堆栈跟踪的要点:

更新

经过一番反复试验,我发现虽然List<String>抛出了类似的异常但MutableList<String>实际上有效:

class PrintString: DoFn<KV<String, MutableList<String>>, String>() {
    @ProcessElement
    fun processElement(
        @Element input: KV<String, MutableList<String>>, receiver: OutputReceiver<String>
    ) {
        val output = input.key + "|" + input.value.toString()
        println("output: $output")
        receiver.output(output)
    }
}
Run Code Online (Sandbox Code Playgroud)

所以,这让我想起 Kotlin 的不可变集合实际上只是接口,底层集合仍然是可变的。但是,尝试替换IterableMutableIterablecontinue 会引发错误。

更新2

我使用上述MutableListper部署了我的 Kotlin Dataflow 作业,但作业失败了:

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.ClassCastException:
org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowViaIteratorsFn$WindowReiterable cannot be cast to java.util.List
    at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184)
    at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:102)
Run Code Online (Sandbox Code Playgroud)

我不得不切换回使用java.lang.Iterable.

Ale*_*oks 6

在使用ParDo以下 a时,我也遇到了这个问题GroupByKey。事实证明,在编写接受 a 的结果的转换时@JvmWildcardIterable泛型类型中需要注解GroupByKey

请参阅下面的人为示例,该示例读取文件并按每行的第一个字符进行分组。

class BeamPipe {
  class ConcatLines : DoFn<KV<String, Iterable<@JvmWildcard String>>, KV<String, String>>() {
    @ProcessElement
    fun processElement(@Element input: KV<String, Iterable<@JvmWildcard String>>, receiver: OutputReceiver<KV<String, String>>) {
      receiver.output(KV.of(input.key, input.value.joinToString("\n")))
    }
  }

  fun pipe(options: PipelineOptions) {
    val file =
        "testFile.txt"
    val p = Pipeline.create(options)
    p.apply(TextIO.read().from(file))
        .apply("Key lines by first character",
            WithKeys.of { line: String -> line[0].toString() }
                .withKeyType(TypeDescriptors.strings()))
        .apply("Group lines by first character", GroupByKey.create<String, String>())
        .apply("Concatenate lines", ParDo.of(ConcatLines()))
        .apply("Write to files", FileIO.writeDynamic<String, KV<String, String>>()
            .by { it.key }
            .withDestinationCoder(StringUtf8Coder.of())
            .via(Contextful.fn(ProcessFunction { it.value }), TextIO.sink())
            .to("whatever")
            .withNaming { key -> FileIO.Write.defaultNaming(key, ".txt") }
        )
    p.run()
  }
}
Run Code Online (Sandbox Code Playgroud)