Spark MongoDB 连接器无法 df.join - Unspecialized MongoConfig

Nig*_*olf 3 mongodb apache-spark pyspark

使用最新的 Spark 连接器 MongoDB (v10) 并尝试连接两个数据帧会产生以下无用的错误。

Py4JJavaError: An error occurred while calling o64.showString.
: java.lang.UnsupportedOperationException: Unspecialised MongoConfig. Use `mongoConfig.toReadConfig()` or `mongoConfig.toWriteConfig()` to specialize
    at com.mongodb.spark.sql.connector.config.MongoConfig.getDatabaseName(MongoConfig.java:201)
    at com.mongodb.spark.sql.connector.config.MongoConfig.getNamespace(MongoConfig.java:196)
    at com.mongodb.spark.sql.connector.MongoTable.name(MongoTable.java:99)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.name(DataSourceV2Relation.scala:66)
    at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.$anonfun$applyOrElse$2(V2ScanRelationPushDown.scala:65)
Run Code Online (Sandbox Code Playgroud)

Pyspark 代码只是拉入两个表并运行联接:

dfa = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.contacts").load()
dfb = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.accounts").load()
dfa.join(dfb, 'PKey').count()
Run Code Online (Sandbox Code Playgroud)

SQL 给出同样的错误:

dfa.createOrReplaceTempView("usr")
dfb.createOrReplaceTempView("ast")
spark.sql("SELECT count(*) FROM ast JOIN usr on usr._id = ast._id").show()
Run Code Online (Sandbox Code Playgroud)

文档结构是扁平的。

小智 5

您是否尝试使用最新版本(10.0.2)的 mongo-spark-connector ?可以在这里找到

我遇到了类似的问题,通过将 10.0.1 替换为 10.0.2 解决了