Ant*_*ony 1 sql apache-spark apache-spark-sql
我有两个表:excluded 和 kaggleresults。我正在尝试查找存在excluded
但不存在的记录kaggleresults
计数:
scala> spark.sql("select * from excluded").count()
res136: Long = 4652
scala> spark.sql("select * from kaggleresults").count()
res137: Long = 4635
Run Code Online (Sandbox Code Playgroud)
区别在于 17
scala> res136-res137
res139: Long = 17
Run Code Online (Sandbox Code Playgroud)
我正在尝试获取这 17 条记录。我写了下面的查询,但它返回38
。
scala> spark.sql("select * from excluded left join kaggleresults on kaggleresults.subject_id = excluded.subject_id where kaggleresults.subject_id is null").count()
res135: Long = 38
Run Code Online (Sandbox Code Playgroud)
题
我需要写什么查询来获取这 17 条记录?
LEFT_ANTI join不就是这样吗?
scala> val excluded = (0 to 5).toDS
left: org.apache.spark.sql.Dataset[Int] = [value: int]
scala> val kaggleresults = (3 to 10).toDS
right: org.apache.spark.sql.Dataset[Int] = [value: int]
scala> excluded.join(kaggleresults, Seq("value"), "leftanti").show
+-----+
|value|
+-----+
| 0|
| 1|
| 2|
+-----+
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
5384 次 |
最近记录: |