我有3个数据流:foo,bar和baz.
LEFT OUTER JOIN在以下链中加入这些流是必要的:foo -> bar -> baz.
这是尝试使用内置流模拟这些rate流:
val rateStream = session.readStream
.format("rate")
.option("rowsPerSecond", 5)
.option("numPartitions", 1)
.load()
val fooStream = rateStream
.select(col("value").as("fooId"), col("timestamp").as("fooTime"))
val barStream = rateStream
.where(rand() < 0.5) // Introduce misses for ease of debugging
.select(col("value").as("barId"), col("timestamp").as("barTime"))
val bazStream = rateStream
.where(rand() < 0.5) // Introduce misses for ease of debugging
.select(col("value").as("bazId"), col("timestamp").as("bazTime"))
Run Code Online (Sandbox Code Playgroud)
这是将这些流连接在一起的第一种方法,假设潜在的延迟foo,bar并且baz很小(〜5 seconds):
val foobarStream …Run Code Online (Sandbox Code Playgroud)