Flink Streaming:如何根据数据将一个数据流输出到不同的输出？

Question

Flink Streaming:如何根据数据将一个数据流输出到不同的输出？

Jan*_*omä 18 java bigdata apache-flink flink-streaming

在Apache Flink中,我有一组元组.让我们假设一个非常简单Tuple1<String>.元组可以在其值字段中具有任意值(例如,"P1","P2"等).可能值的集合是有限的,但我事先并不知道全集(所以可能有'P362').我想根据元组内部的值将该元组写入某个输出位置.所以我希望有以下文件结构:

/output/P1
/output/P2

在文档中我只发现了写入我事先知道的位置的可能性(例如stream.writeCsv("/output/somewhere")),但没有办法让数据的内容决定数据实际结束的位置.

我在文档中读到了关于输出拆分的内容,但这似乎没有提供一种方法将输出重定向到我希望拥有它的方式(或者我只是不明白这是如何工作的).

可以使用Flink API完成,如果是这样,怎么做？如果没有,是否可能有第三方图书馆可以做到这一点,还是我必须自己构建这样的东西？

更新

根据Matthias的建议,我想出了一个筛选接收函数,它确定输出路径,然后在序列化之后将元组写入相应的文件.我把它放在这里作为参考,也许对其他人有用:

public class SiftingSinkFunction<IT> extends RichSinkFunction<IT> {

    private final OutputSelector<IT> outputSelector;
    private final MapFunction<IT, String> serializationFunction;
    private final String basePath;
    Map<String, TextOutputFormat<String>> formats = new HashMap<>();

    /**
     * @param outputSelector        the selector which determines into which output(s) a record is written.
     * @param serializationFunction a function which serializes the record to a string.
     * @param basePath              the base path for writing the records. It will be appended with the output selector.
     */
    public SiftingSinkFunction(OutputSelector<IT> outputSelector, MapFunction<IT, String> serializationFunction, String basePath) {
        this.outputSelector = outputSelector;
        this.serializationFunction = serializationFunction;
        this.basePath = basePath;
    }


    @Override
    public void invoke(IT value) throws Exception {
        // find out where to write.
        Iterable<String> selection = outputSelector.select(value);
        for (String s : selection) {
            // ensure we have a format for this.
            TextOutputFormat<String> destination = ensureDestinationExists(s);
            // then serialize and write.
            destination.writeRecord(serializationFunction.map(value));
        }
    }

    private TextOutputFormat<String> ensureDestinationExists(String selection) throws IOException {
        // if we know the destination, we just return the format.
        if (formats.containsKey(selection)) {
            return formats.get(selection);
        }

        // create a new output format and initialize it from the context.
        TextOutputFormat<String> format = new TextOutputFormat<>(new Path(basePath, selection));
        StreamingRuntimeContext context = (StreamingRuntimeContext) getRuntimeContext();
        format.configure(context.getTaskStubParameters());
        format.open(context.getIndexOfThisSubtask(), context.getNumberOfParallelSubtasks());

        // put it into our map.
        formats.put(selection, format);
        return format;
    }

    @Override
    public void close() throws IOException {
        Exception lastException = null;
        try {
            for (TextOutputFormat<String> format : formats.values()) {
                try {
                    format.close();
                } catch (Exception e) {
                    lastException = e;
                    format.tryCleanupOnError();
                }
            }
        } finally {
            formats.clear();
        }

        if (lastException != null) {
            throw new IOException("Close failed.", lastException);
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mat*_*Sax 7

您可以实现自定义接收器.继承以下两者之一:

org.apache.flink.streaming.api.functions.sink.SinkFunction
org.apache.flink.streaming.api.functions.sink.RichSinkFunction

在你的程序中使用:

stream.addSink(SinkFunction<T> sinkFunction);

Run Code Online (Sandbox Code Playgroud)

而不是stream.writeCsv("/output/somewhere").

谢谢!我检查了`FileSinkFunction`的实现,并自己想出了类似的东西.我将实现添加到我的问题中以供参考. (5认同)

归档时间：	10 年，1 月前
查看次数：	5293 次
最近记录：	10 年，1 月前