r4i*_*id4 15 ipython ipython-notebook apache-spark pyspark osx-elcapitan
我已经在线学习了一些教程,但他们不能Spark 1.5.1在OS X El Capitan上工作(10.11)
基本上我已经运行了这个命令下载 apache-spark
brew update
brew install scala
brew install apache-spark
Run Code Online (Sandbox Code Playgroud)
更新了.bash_profile
# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
export SPARK_HOME="/usr/local/Cellar/apache-spark/1.5.1/libexec/"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
fi
Run Code Online (Sandbox Code Playgroud)
跑
ipython profile create pyspark
Run Code Online (Sandbox Code Playgroud)
创建了~/.ipython/profile_pyspark/startup/00-pyspark-setup.py以这种方式配置的启动文件
# Configure the necessary Spark environment
import os
import sys
# Spark home
spark_home = os.environ.get("SPARK_HOME")
# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")
# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))
# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))
Run Code Online (Sandbox Code Playgroud)
我然后运行ipython notebook --profile=pyspark,笔记本工作正常,但sc(火花上下文)无法识别.
有人设法做到这一点Spark 1.5.1吗?
编辑:您可以按照本指南使其工作
Alb*_*nto 24
我安装了Jupyter,确实比你想象的要简单:
安装jupyter键入终端的下一行单击我以获取更多信息.
ilovejobs@mymac:~$ conda install jupyter
Run Code Online (Sandbox Code Playgroud)更新jupyter以防万一.
ilovejobs@mymac:~$ conda update jupyter
Run Code Online (Sandbox Code Playgroud)下载Apache Spark并对其进行编译,或者下载并解压缩Apache Spark 1.5.1 + Hadoop 2.6.
ilovejobs@mymac:~$ cd Downloads
ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz
Run Code Online (Sandbox Code Playgroud)Apps在您的家中创建一个文件夹(即):
ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
Run Code Online (Sandbox Code Playgroud)将未压缩的文件夹移动spark-1.5.1到~/Apps目录.
ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
Run Code Online (Sandbox Code Playgroud)移动到~/Apps目录并验证火花是否存在.
ilovejobs@mymac:~/Downloads$ cd ~/Apps
ilovejobs@mymac:~/Apps$ ls -l
drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1
Run Code Online (Sandbox Code Playgroud)这是第一个棘手的部分.将spark二进制文件添加到您的$PATH:
ilovejobs@mymac:~/Apps$ cd
ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profile
Run Code Online (Sandbox Code Playgroud)这是第二个棘手的部分.另外添加此环境变量:
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
Run Code Online (Sandbox Code Playgroud)获取配置文件以使这些变量可用于此终端
ilovejobs@mymac:~$ source .profile
Run Code Online (Sandbox Code Playgroud)创建一个~/notebooks目录.
ilovejobs@mymac:~$ mkdir notebooks
Run Code Online (Sandbox Code Playgroud)移至~/notebooks并运行pyspark:
ilovejobs@mymac:~$ cd notebooks
ilovejobs@mymac:~/notebooks$ pyspark
Run Code Online (Sandbox Code Playgroud)请注意,您可以将这些变量添加到.bashrc您家中的位置.
现在开心,你应该能够运行带有pyspark内核的jupyter(它会将它显示为python 2,但它会使用spark)
| 归档时间: |
|
| 查看次数: |
11582 次 |
| 最近记录: |