yun*_*ang 8 apache-spark apache-toree
有没有办法将Apache Toree连接到远程火花群?我看到常见的命令是
jupyter toree install --spark_home=/usr/local/bin/apache-spark/
Run Code Online (Sandbox Code Playgroud)
如何在不必在本地安装的情况下在远程服务器上使用spark?
确实有一种使Toree连接到远程Spark集群的方法。
我发现的最简单的方法是克隆现有的Toree Scala / Python内核,并创建一个新的Toree Scala / Python Remote内核。这样,您可以选择在本地或远程运行。
脚步:
复制现有内核。在我特定的Toree安装中,指向内核的路径位于:/usr/local/share/jupyter/kernels/,因此我执行了以下命令:
cp -pr /usr/local/share/jupyter/kernels/apache_toree_scala/ /usr/local/share/jupyter/kernels/apache_toree_scala_remote/
在中编辑新kernel.json文件,/usr/local/share/jupyter/kernels/apache_toree_scala_remote/然后将必需的Spark选项添加到__TOREE_SPARK_OPTS__变量中。从技术上讲,仅--master <path>是必需的,但您也可以将--num-executors,-executor-memory等添加到变量中。
重新启动Jupyter。
我的kernel.json文件如下所示:
{
"display_name": "Toree - Scala Remote",
"argv": [
"/usr/local/share/jupyter/kernels/apache_toree_scala_remote/bin/run.sh",
"--profile",
"{connection_file}"
],
"language": "scala",
"env": {
"PYTHONPATH": "/opt/spark/python:/opt/spark/python/lib/py4j-0.9-src.zip",
"SPARK_HOME": "/opt/spark",
"DEFAULT_INTERPRETER": "Scala",
"PYTHON_EXEC": "python",
"__TOREE_OPTS__": "",
"__TOREE_SPARK_OPTS__": "--master spark://192.168.0.255:7077 --deploy-mode client --num-executors 4 --executor-memory 4g --executor-cores 8 --packages com.databricks:spark-csv_2.10:1.4.0"
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2362 次 |
| 最近记录: |