Dan*_*nov 6 scala mongodb apache-spark apache-spark-sql
我遇到了一个奇怪的问题.我正在尝试使用mongodb spark连接器将Spark本地连接到MongoDB.
除了设置火花我还使用以下代码:
val readConfig = ReadConfig(Map("uri" -> "mongodb://localhost:27017/movie_db.movie_ratings", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val writeConfig = WriteConfig(Map("uri" -> "mongodb://127.0.0.1/movie_db.movie_ratings"))
// Load the movie rating data from Mongo DB
val movieRatings = MongoSpark.load(sc, readConfig).toDF()
movieRatings.show(100)
Run Code Online (Sandbox Code Playgroud)
但是,我收到编译错误:
java.lang.IllegalArgumentException: Missing database name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.database' property.
Run Code Online (Sandbox Code Playgroud)
在线我在哪里设置readConfig.我不明白为什么当我在地图上清楚地拥有uri属性时,它抱怨没有设置uri.我可能会遗漏一些东西.
你可以从SparkSession这里提到的那样做
val spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.input.uri", "mongodb://localhost:27017/movie_db.movie_ratings")
.config("spark.mongodb.input.readPreference.name", "secondaryPreferred")
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/movie_db.movie_ratings")
.getOrCreate()
Run Code Online (Sandbox Code Playgroud)
使用配置创建数据帧
val readConfig = ReadConfig(Map("uri" -> "mongodb://localhost:27017/movie_db.movie_ratings", "readPreference.name" -> "secondaryPreferred"))
val df = MongoSpark.load(spark)
Run Code Online (Sandbox Code Playgroud)
写df到mongodb
MongoSpark.save(
df.write
.option("spark.mongodb.output.uri", "mongodb://127.0.0.1/movie_db.movie_ratings")
.mode("overwrite"))
Run Code Online (Sandbox Code Playgroud)
在您的代码中:配置中缺少前缀
val readConfig = ReadConfig(Map(
"spark.mongodb.input.uri" -> "mongodb://localhost:27017/movie_db.movie_ratings",
"spark.mongodb.input.readPreference.name" -> "secondaryPreferred"),
Some(ReadConfig(sc)))
val writeConfig = WriteConfig(Map(
"spark.mongodb.output.uri" -> "mongodb://127.0.0.1/movie_db.movie_ratings"))
Run Code Online (Sandbox Code Playgroud)