我正在尝试通过 Mongo-Spark 连接器使用远程计算机上的 MongoDB 实例运行 Spark-submit 作业。
当我启动没有标志的mongod 服务时--auth
,并运行spark-submit
如下命令:
./bin/spark-submit --master spark://10.0.3.155:7077 \
--conf "spark.mongodb.input.uri=mongodb://10.0.3.156/test.coll?readPreference=primaryPreferred" \
--conf "spark.mongodb.output.uri=mongodb://10.0.3.156/test.coll" \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0 \
app1.py
Run Code Online (Sandbox Code Playgroud)
一切都像魅力一样。
但是当我使用该标志运行 mongod 服务时,并像这样--auth
运行spark-submit
:
./bin/spark-submit --master spark://10.0.3.155:7077 \
--conf "spark.mongodb.input.uri=mongodb://admin:pass@10.0.3.156/test.coll?readPreference=primaryPreferred" \
--conf "spark.mongodb.output.uri=mongodb://admin:pass@10.0.3.156/test.coll" \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0 \
app1.py
Run Code Online (Sandbox Code Playgroud)
我收到这些错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o47.save. : com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state …
Run Code Online (Sandbox Code Playgroud) authentication mongodb apache-spark apache-spark-sql spark-submit
我正在尝试通过Apache Spark master从Mongo DB读取数据.
我正在使用3台机器:
应用程序(M3)正在连接到spark master,如下所示:
_sparkSession = SparkSession.builder.master(masterPath).appName(appName)\
.config("spark.mongodb.input.uri", "mongodb://10.0.3.150/db1.data.coll")\
.config("spark.mongodb.output.uri", "mongodb://10.0.3.150/db1.data.coll").getOrCreate()
Run Code Online (Sandbox Code Playgroud)
应用程序(M3)正在尝试从DB读取数据:
sqlContext = SQLContext(_sparkSession.sparkContext)
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").option("uri","mongodb://user:pass@10.0.3.150/db1.data?readPreference=primaryPreferred").load()
Run Code Online (Sandbox Code Playgroud)
但是因为这个例外而失败:
py4j.protocol.Py4JJavaError: An error occurred while calling o56.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at …
Run Code Online (Sandbox Code Playgroud)