小编JC4*_*417的帖子

Spark结构化流左外部联接为已匹配的行返回外部null

我基本上是使用Spark的文档中给出的示例以及内置的测试流,其中一个流领先3秒(最初使用kafka,但遇到了同样的问题)。结果正确返回匹配列,但是过一会儿返回相同的键,但外部为null。

这是预期的行为吗?有匹配项时,是否有办法排除重复的外部null结果?

码:

val testStream = session.readStream.format("rate")
  .option("rowsPerSecond", "5").option("numPartitions", "1").load()

val impressions = testStream
  .select(
    (col("value") + 15).as("impressionAdId"),
    col("timestamp").as("impressionTime"))

val clicks = testStream
  .select(
    col("value").as("clickAdId"),
    col("timestamp").as("clickTime"))


// Apply watermarks on event-time columns
val impressionsWithWatermark =
  impressions.withWatermark("impressionTime", "20 seconds")
val clicksWithWatermark =
  clicks.withWatermark("clickTime", "30 seconds")

// Join with event-time constraints
val result = impressionsWithWatermark.join(
  clicksWithWatermark,
  expr("""
clickAdId = impressionAdId AND
clickTime >= impressionTime AND
clickTime <= impressionTime + interval 10 seconds
"""),
  joinType = "leftOuter"      // can be "inner", …
Run Code Online (Sandbox Code Playgroud)

scala apache-spark spark-structured-streaming

5
推荐指数
1
解决办法
297
查看次数