我正在使用Spark 2.0.0的"预览"Google DataProc Image 1.1.为了完成我的一项操作,我必须完成一个笛卡尔积.从版本2.0.0开始,创建了一个spark配置参数(spark.sql.cross Join.enabled),禁止使用笛卡尔积,并抛出异常.如何设置spark.sql.crossJoin.enabled = true,最好是使用初始化操作?
spark.sql.crossJoin.enabled=true
我有以下功能可以编译.
def compare(dbo: Dataset[Cols], ods: Dataset[Cols]) = {
val j = dbo.crossJoin(ods)
// Tried val j = dbo.joinWith(ods, func.expr("true")) too
j.take(5).foreach(r => println(r))
}
Run Code Online (Sandbox Code Playgroud)
但是在提交给Spark时遇到了运行时错误.
Join condition is missing or trivial. (if using joinWith stead of crossJoin) Use the CROSS JOIN syntax to allow cartesian products between these relations.; at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067) at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307) at …