如何在Dataflow中执行联合?

Sam*_*ety 5 google-cloud-dataflow

我正在尝试在Dataflow中执行联合操作.是否有用于在Dataflow中结合两个PCollection的示例代码?

Sam*_*ety 8

一个简单的方法是将Flatten()和RemoveDuplicates()结合起来.根据您是否需要不相交联合或集合理论联合,可以省略RemoveDuplicates调用:

PCollection<String> pc1 = ...;
PCollection<String> pc2 = ...;
PCollection<String> union = PCollectionList.of(pc1).and(pc2)
  .apply(Flatten.<String>create())
  .apply(RemoveDuplicates.<String>create());
Run Code Online (Sandbox Code Playgroud)