HP.*_*HP. 10 ipython apache-spark pyspark jupyter
我看着Apache Toree用作Jupyter的Pyspark内核
https://github.com/apache/incubator-toree
然而,它使用旧版本的Spark(1.5.1与当前的1.6.0).我试图通过创建http://arnesund.com/2015/09/21/spark-cluster-on-openstack-with-multi-user-jupyter-notebook/来使用此方法kernel.js
{
"display_name": "PySpark",
"language": "python",
"argv": [
"/usr/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/usr/local/Cellar/apache-spark/1.6.0/libexec",
"PYTHONPATH": "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/:/usr/local/Cellar/apache-spark/1.6.0/libexec/python/lib/py4j-0.9-src.zip",
"PYTHONSTARTUP": "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--master local[*] pyspark-shell"
}
}
Run Code Online (Sandbox Code Playgroud)
但是,我遇到了一些问题:
/jupyter/kernels我的Mac上没有路径.所以我最终创造了这条道路~/.jupyter/kernels/pyspark.我不确定这是不是正确的道路.
即使拥有所有正确的路径,我仍然没有看到PySparkJupyter内部显示为内核.
我错过了什么?
小智 20
使用python内核启动jupyter notebook,然后运行以下命令在Jupyter中初始化pyspark.
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()
Run Code Online (Sandbox Code Playgroud)
仅供参考:已经尝试过大部分配置在Jupyter中使用pyspark内核启动Apache Toree但没有成功,
Jupyter 内核应位于 $JUPYTER_DATA_DIR 中。在 OSX 上,这是 ~/Library/Jupyter。请参阅:http ://jupyter.readthedocs.org/en/latest/system.html