zer*_*ozf 5 python jar ipython-notebook apache-spark pyspark
我正在尝试将mongodb hadoop与spark集成,但无法弄清楚如何让IPython笔记本可以访问jar.
这就是我要做的:
# set up parameters for reading from MongoDB via Hadoop input format
config = {"mongo.input.uri": "mongodb://localhost:27017/db.collection"}
inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
# these values worked but others might as well
keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"
# Do some reading from mongo
items = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, config)
Run Code Online (Sandbox Code Playgroud)
当我使用以下命令在pyspark中启动它时,此代码工作正常:
spark-1.4.1/bin/pyspark --jars'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
where mongo-hadoop-core-1.4.0.jar 和mongo-java-driver-2.10.1.jar允许从java使用mongodb.但是,当我这样做时:
IPYTHON_OPTS ="notebook"spark-1.4.1/bin/pyspark --jars'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
这些罐子不再可用了,我收到以下错误:
java.lang.ClassNotFoundException:com.mongodb.hadoop.MongoInputFormat
有谁知道如何在IPython笔记本中为罐子提供罐子?我很确定这不是特定于mongo所以也许有人已经成功地在使用笔记本时将jar添加到类路径中?
| 归档时间: |
|
| 查看次数: |
3953 次 |
| 最近记录: |