相关疑难解决方法(0)

Spark 2.x的spark.sql.crossJoin.enabled

我正在使用Spark 2.0.0的"预览"Google DataProc Image 1.1.为了完成我的一项操作,我必须完成一个笛卡尔积.从版本2.0.0开始,创建了一个spark配置参数(spark.sql.cross Join.enabled),禁止使用笛卡尔积,并抛出异常.如何设置spark.sql.crossJoin.enabled = true,最好是使用初始化操作? spark.sql.crossJoin.enabled=true

apache-spark google-cloud-dataproc

9
推荐指数
2
解决办法
8449
查看次数

交叉连接运行时错误:使用CROSS JOIN语法允许这些关系之间的笛卡尔积

我有以下功能可以编译.

  def compare(dbo: Dataset[Cols], ods: Dataset[Cols]) = {
    val j = dbo.crossJoin(ods)
    // Tried val j = dbo.joinWith(ods, func.expr("true")) too
    j.take(5).foreach(r => println(r)) 
  }
Run Code Online (Sandbox Code Playgroud)

但是在提交给Spark时遇到了运行时错误.

Join condition is missing or trivial. (if using joinWith stead of crossJoin)
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
        at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
        at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
        at …

scala apache-spark apache-spark-sql

4
推荐指数
1
解决办法
4778
查看次数