我一直认为数据集/数据帧API是相同的......唯一的区别是数据集API将为您提供编译时安全性.对 ?
那么......我的案子非常简单:
case class Player (playerID: String, birthYear: Int)
val playersDs: Dataset[Player] = session.read
.option("header", "true")
.option("delimiter", ",")
.option("inferSchema", "true")
.csv(PeopleCsv)
.as[Player]
// Let's try to find players born in 1999.
// This will work, you have compile time safety... but it will not use predicate pushdown!!!
playersDs.filter(_.birthYear == 1999).explain()
// This will work as expected and use predicate pushdown!!!
// But you can't have compile time safety with this :(
playersDs.filter('birthYear === 1999).explain()
Run Code Online (Sandbox Code Playgroud)
从第一个示例解释将显示它不执行谓词下推(注意空PushedFilters):
== Physical Plan == …Run Code Online (Sandbox Code Playgroud) dataframe apache-spark apache-spark-sql apache-spark-dataset